GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes

Joshua Niemeijer; Alaa Eddine Ben Zekri; Reza Bahmanyar; Philipp M. Schm\"alzle; Houda Chaabouni-Chouayakh; Franz Kurz

arXiv:2604.19411·cs.CV·April 22, 2026

GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes

Joshua Niemeijer, Alaa Eddine Ben Zekri, Reza Bahmanyar, Philipp M. Schm\"alzle, Houda Chaabouni-Chouayakh, Franz Kurz

PDF

TL;DR

GOLD-BEV is a framework that learns dense, scene-centric BEV maps of dynamic road scenes using aerial imagery supervision during training, enabling scalable and consistent semantic mapping.

Contribution

It introduces a novel approach combining aerial imagery supervision with ego-centric sensors for dense semantic BEV mapping of dynamic scenes.

Findings

01

Achieves dense semantic mapping with minimal manual annotation.

02

Effectively supervises moving traffic participants using synchronized aerial data.

03

Supports pseudo-labeling and synthesis of aerial views from ego sensors.

Abstract

Understanding road scenes in a geometrically consistent, scene-centric representation is crucial for planning and mapping. We present GOLD-BEV, a framework that learns dense bird's-eye-view (BEV) semantic environment maps-including dynamic agents-from ego-centric sensors, using time-synchronized aerial imagery as supervision only during training. BEV-aligned aerial crops provide an intuitive target space, enabling dense semantic annotation with minimal manual effort and avoiding the ambiguity of ego-only BEV labeling. Crucially, strict aerial-ground synchronization allows overhead observations to supervise moving traffic participants and mitigates the temporal inconsistencies inherent to non-synchronized overhead sources. To obtain scalable dense targets, we generate BEV pseudo-labels using domain-adapted aerial teachers, and jointly train BEV segmentation with optional pseudo-aerial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.