Partitioning large watersheds into sub-basins with RichDEM
Partitioning large watersheds into sub-basins with RichDEM follows a deterministic, raster-based hydrological routing sequence: condition the DEM, compute flow direction (D8 or D∞), generate a flow accumulation surface, apply a drainage threshold to isolate the stream network, and segment contiguous contributing areas. RichDEM’s C++ backend leverages memory-mapped I/O and OpenMP parallelization, enabling reproducible sub-basin extraction on multi-gigabyte DEMs without proprietary GIS overhead. This pipeline integrates directly into broader Watershed Delineation & Catchment Synchronization workflows, providing a scalable foundation for regional hydrologic modeling, flood routing, and water quality assessment.
Environment & Dependency Matrix
RichDEM’s Python bindings wrap compiled C++ routing algorithms and rely on strict GDAL alignment. Environment mismatches are the primary cause of silent raster corruption or segmentation faults.
| Component | Minimum Version | Configuration Notes |
|---|---|---|
| Python | 3.8–3.11 | Python 3.12+ may require a conda-forge rebuild of richdem |
| RichDEM | 0.3.4 | Install via conda install -c conda-forge richdem for precompiled wheels |
| GDAL | 3.4.0 | Python GDAL binding version must match the system GDAL library |
| NumPy | 1.21 | Required for array interoperability and threshold masking |
| OS | Linux/macOS/Windows | Windows requires MSVC runtime; conda environments are strongly recommended |
| Parallelism | OpenMP 4.0+ | Set OMP_NUM_THREADS before importing RichDEM to control CPU allocation |
Critical memory note: RichDEM loads rasters into RAM by default. For DEMs exceeding 8–12 GB, pre-tile with gdal_translate -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 and process tiles independently.
Production Pipeline Code
The following script implements a complete, automated sub-basin partitioning pipeline. It assumes the input DEM has already undergone hydrological conditioning (sink filling) or performs it inline.
import os
import numpy as np
import rasterio
import richdem as rd
import warnings
# Set parallelism BEFORE importing RichDEM modules that use OpenMP
os.environ["OMP_NUM_THREADS"] = "8"
warnings.filterwarnings("ignore", category=FutureWarning)
def partition_subbasins(
dem_path: str,
out_dir: str,
accumulation_threshold: int = 1000,
method: str = "D8"
) -> None:
"""
Partition a large watershed into sub-basins using RichDEM.
Parameters
----------
dem_path : str
Path to hydrologically conditioned DEM (GeoTIFF)
out_dir : str
Output directory for intermediate and final rasters
accumulation_threshold : int
Minimum flow accumulation cells to define a stream
method : str
Routing algorithm: 'D8' or 'Dinf'
"""
os.makedirs(out_dir, exist_ok=True)
# Capture spatial reference from the source GeoTIFF before richdem loads it
print(f"[1/5] Loading DEM: {dem_path}")
with rasterio.open(dem_path) as src:
ref_transform = src.transform
ref_crs = src.crs
dem = rd.LoadGDAL(dem_path, no_data=-9999)
if dem is None:
raise RuntimeError("Failed to load DEM. Verify GDAL driver support and file integrity.")
# 2. Fill depressions (Priority-Flood)
print("[2/5] Filling depressions...")
dem = rd.FillDepressions(dem, in_place=False)
# 3. Compute flow accumulation
# rd.FlowAccumulation accepts the conditioned DEM; the routing method is
# specified here and used internally to compute direction then accumulation.
routing_method = "Dinf" if method.upper() in ("DINF", "D-INF") else "D8"
print(f"[3/5] Computing {routing_method} flow accumulation...")
acc = rd.FlowAccumulation(dem, method=routing_method)
# 4. Threshold to isolate stream network
print(f"[4/5] Extracting streams (threshold >= {accumulation_threshold} cells)...")
acc_np = np.array(acc)
streams = (acc_np >= accumulation_threshold).astype(np.int32)
# 5. Compute flow direction for watershed segmentation
print("[5/5] Labeling sub-basins...")
fd = rd.FlowDirection(dem, method=routing_method)
subbasins = rd.Watersheds(fd, streams)
# Export all outputs with preserved geospatial metadata
_export_raster(np.array(subbasins), os.path.join(out_dir, "subbasins.tif"), ref_transform, ref_crs)
_export_raster(acc_np, os.path.join(out_dir, "flow_accum.tif"), ref_transform, ref_crs)
_export_raster(streams, os.path.join(out_dir, "stream_network.tif"), ref_transform, ref_crs)
print("Pipeline complete. Outputs saved to:", out_dir)
def _export_raster(
data: np.ndarray,
out_path: str,
transform,
crs,
nodata: float = -9999.0
) -> None:
"""Export an array to GeoTIFF using a captured rasterio transform and CRS."""
with rasterio.open(
out_path,
"w",
driver="GTiff",
height=data.shape[0],
width=data.shape[1],
count=1,
dtype=data.dtype,
crs=crs if crs is not None else "EPSG:4326",
transform=transform,
compress="LZW",
tiled=True,
blockxsize=512,
blockysize=512,
nodata=nodata
) as dst:
dst.write(data, 1)
Memory & Parallelism Optimization
Large-scale hydrological routing is I/O and memory-bound. To prevent out-of-memory crashes or thread contention:
- Pre-tile source DEMs: Use
gdal_translatewithTILED=YESandBLOCKXSIZE=512before processing. Tiled GeoTIFFs align with RichDEM’s internal chunking strategy, reducing disk seek overhead. See GDAL GeoTIFF Driver Options for configuration details. - Control OpenMP threads: Set
OMP_NUM_THREADSto match physical cores (not logical threads). Hyperthreading rarely improves raster routing performance and can increase cache thrashing. - Avoid intermediate copies: RichDEM returns
rdArrayobjects that share memory with underlying buffers. Do not call.copy()unless explicitly modifying values; otherwise, you will double memory consumption. - Use in_place=True cautiously:
FillDepressions(dem, in_place=True)modifies the input array directly, which saves memory but prevents you from comparing the original and filled DEMs for validation.
Validation & Integration
Automated partitioning requires post-processing validation to ensure hydrological realism:
- Drainage Density Check: Calculate total stream length per unit area. Values significantly outside regional norms (e.g., 0.5–3.0 km/km² for temperate basins) indicate threshold misalignment or DEM artifacts.
- Gauge Alignment: Overlay USGS/NHD stream networks or field-mapped catchment boundaries. Sub-basin outlets should align with known monitoring stations or confluence points.
- Threshold Sensitivity: The
accumulation_thresholdparameter controls basin granularity. Lower values produce finer, more fragmented sub-basins; higher values merge tributaries into larger units. For methodological guidance on threshold selection, consult Basin Partitioning Strategies. - Algorithm Choice: D8 routing is computationally efficient and suitable for coarse-resolution DEMs (>10 m). D∞ distributes flow across two downslope cells, reducing artificial channelization but increasing runtime by ~15–20%. Refer to the RichDEM Documentation for algorithm-specific parameters and edge-case handling.
When integrating outputs into distributed hydrologic models (SWAT, HEC-HMS, or custom Python frameworks), ensure CRS consistency, verify that -9999 nodata values are respected during raster math, and maintain a version-controlled record of the threshold and routing method used. This guarantees reproducibility and simplifies catchment synchronization across multi-temporal datasets.