Learning Ego-Centric BEV Representations from a Perspective-Privileged View: Cross-View Supervision for Online HD Map Construction
Daniel Lengerer, Mathias Pechinger, Klaus Bogenberger, Carsten Markgraf

TL;DR
This paper introduces Cross-View Supervision (CVS), a novel training paradigm that transfers geometric priors from overhead views to improve ego-centric BEV representations for HD map construction without altering inference architecture.
Contribution
CVS aligns BEV representations from camera inputs with perspective-privileged overhead views, enhancing structural coherence and long-range accuracy in HD map tasks.
Findings
CVS improves mAP by 3.9 in standard regions and 9.9 in extended regions.
Maintains camera-only inference while leveraging overhead supervision.
Achieves 44% relative gain at long range in BEV map accuracy.
Abstract
Bird's-eye-view (BEV) representations derived from multi-camera input have become a central interface for online high-definition (HD) map construction. However, most approaches rely solely on ego-centric supervision, requiring large-scale scene structure to be inferred from incomplete observations, occlusions, and diminishing information density at long range, where perspective effects and spatial sparsity hinder consistent structural reasoning. We introduce Cross-View Supervision (CVS), a representation learning paradigm that transfers geometric and topological priors from an ego-aligned overhead perspective into camera-based BEV encoders. Rather than adding auxiliary semantic losses, CVS aligns representations in a shared BEV feature space and distills globally consistent structural knowledge from a perspective-privileged teacher into the ego-centric backbone. This supervision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
