GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes
Joshua Niemeijer, Alaa Eddine Ben Zekri, Reza Bahmanyar, Philipp M. Schm\"alzle, Houda Chaabouni-Chouayakh, Franz Kurz

TL;DR
GOLD-BEV is a framework that learns dense, scene-centric BEV maps of dynamic road scenes using aerial imagery supervision during training, enabling scalable and consistent semantic mapping.
Contribution
It introduces a novel approach combining aerial imagery supervision with ego-centric sensors for dense semantic BEV mapping of dynamic scenes.
Findings
Achieves dense semantic mapping with minimal manual annotation.
Effectively supervises moving traffic participants using synchronized aerial data.
Supports pseudo-labeling and synthesis of aerial views from ego sensors.
Abstract
Understanding road scenes in a geometrically consistent, scene-centric representation is crucial for planning and mapping. We present GOLD-BEV, a framework that learns dense bird's-eye-view (BEV) semantic environment maps-including dynamic agents-from ego-centric sensors, using time-synchronized aerial imagery as supervision only during training. BEV-aligned aerial crops provide an intuitive target space, enabling dense semantic annotation with minimal manual effort and avoiding the ambiguity of ego-only BEV labeling. Crucially, strict aerial-ground synchronization allows overhead observations to supervise moving traffic participants and mitigates the temporal inconsistencies inherent to non-synchronized overhead sources. To obtain scalable dense targets, we generate BEV pseudo-labels using domain-adapted aerial teachers, and jointly train BEV segmentation with optional pseudo-aerial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
