Top2Ground: A Height-Aware Dual Conditioning Diffusion Model for Robust Aerial-to-Ground View Generation

Jae Joong Lee; Bedrich Benes

arXiv:2511.08258·cs.CV·November 12, 2025

Top2Ground: A Height-Aware Dual Conditioning Diffusion Model for Robust Aerial-to-Ground View Generation

Jae Joong Lee, Bedrich Benes

PDF

Open Access

TL;DR

Top2Ground is a diffusion model that generates realistic ground-view images from aerial images by leveraging height and semantic information, improving geometric and semantic consistency.

Contribution

It introduces a height-aware dual conditioning diffusion approach that directly synthesizes ground images without intermediate 3D representations.

Findings

01

Achieves 7.3% higher SSIM on benchmarks

02

Handles diverse viewpoints and occlusions effectively

03

Demonstrates strong generalization across datasets

Abstract

Generating ground-level images from aerial views is a challenging task due to extreme viewpoint disparity, occlusions, and a limited field of view. We introduce Top2Ground, a novel diffusion-based method that directly generates photorealistic ground-view images from aerial input images without relying on intermediate representations such as depth maps or 3D voxels. Specifically, we condition the denoising process on a joint representation of VAE-encoded spatial features (derived from aerial RGB images and an estimated height map) and CLIP-based semantic embeddings. This design ensures the generation is both geometrically constrained by the scene's 3D structure and semantically consistent with its content. We evaluate Top2Ground on three diverse datasets: CVUSA, CVACT, and the Auto Arborist. Our approach shows 7.3% average improvement in SSIM across three benchmark datasets, showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques