GrounDiff: Diffusion-Based Ground Surface Generation from Digital
Surface Models

1 DeepScenario     2 TUM     3 MCML
WACV 2026

GrounDiff redefines DSM-to-DTM conversion by "denoising" terrain with diffusion, treating buildings and vegetation as noise and delivering cleaner, more precise ground surfaces than state-of-the-art approaches.

Abstract

Digital Terrain Models (DTMs) represent the bare-earth elevation and are important in numerous geospatial applications. Such data models cannot be directly measured by sensors and are typically generated from Digital Surface Models (DSMs) derived from LiDAR or photogrammetry. Traditional filtering approaches rely on manually tuned parameters, while learning-based methods require well-designed architectures, often combined with post-processing. To address these challenges, we introduce Ground Diffusion (GrounDiff), the first diffusion-based framework that iteratively removes non-ground structures by formulating the problem as a denoising task. We incorporate a gated design with confidence-guided generation that enables selective filtering. To increase scalability, we further propose Prior-Guided Stitching (PrioStitch), which employs a downsampled global prior automatically generated using GrounDiff to guide local high-resolution predictions. We evaluate our method on the DSM-to-DTM translation task across diverse datasets, showing that GrounDiff, consistently outperforms deep learning-based state-of-the-art methods, reducing RMSE by up to 93% on ALS2DTM and up to 47% on USGS benchmarks. In the task of road reconstruction, which requires both high precision and smoothness, our method achieves up to 81% lower distance error compared to specialized techniques on the GeRoD benchmark, while maintaining competitive surface smoothness using only DSM inputs, without task-specific optimization. Our variant for road reconstruction, GrounDiff+, is specifically designed to produce even smoother surfaces, further surpassing state-of-the-art methods.

Ground Generation

The samples come from the following research datasets presented in detail in the publication: ALS2DTM and USGS datasets. The satellite images are sourced from Bing Maps.

Successful Generations

Example 1
Sat. ImageDSM
GT DTMPred. DTM
Example 2
Sat. ImageDSM
GT DTMPred. DTM
Example 3
Sat. ImageDSM
GT DTMPred. DTM
Example 4
Sat. ImageDSM
GT DTMPred. DTM
Example 5
Sat. ImageDSM
GT DTMPred. DTM
Example 6
Sat. ImageDSM
GT DTMPred. DTM
GrounDiff reliably removes buildings, vegetation, and other above-ground structures while preserving natural terrain features, even in challenging conditions.

Limitations

Example 1
Sat. ImageDSM
GT DTMPred. DTM
Example 2
Sat. ImageDSM
GT DTMPred. DTM
Example 3
Sat. ImageDSM
GT DTMPred. DTM
GrounDiff has difficulties in areas with abrupt elevation changes such as alpine terrain. Sharp height gradients can look similar to building facades, which leads to misclassification and produces regeneration errors with locally smoothed surfaces. In dense vegetation, where the ground is mostly hidden, the model tends to infer vegetation height but lacks real ground references. As a result, it can fail entirely when pixel height differences are too small to distinguish above-ground structures.

Ground Generation for Smooth Road Reconstruction

Press G to toggle wireframe. Press R to reset view.

BibTeX

@inproceedings{dhaouadi2026groundiff,
  title        = {GrounDiff: Diffusion-Based Ground Surface Generation from Digital Surface Models},
  author       = {Dhaouadi, Oussema and Meier, Johannes and Kaiser, Jacques and Cremers, Daniel},
  booktitle    = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
  year         = {2026}
}