Learning Ego-Centric BEV Representations from a Perspective-Privileged View: Cross-View Supervision for Online HD Map Construction

Daniel Lengerer; Mathias Pechinger; Klaus Bogenberger; Carsten Markgraf

arXiv:2605.12218·cs.CV·May 13, 2026

Learning Ego-Centric BEV Representations from a Perspective-Privileged View: Cross-View Supervision for Online HD Map Construction

Daniel Lengerer, Mathias Pechinger, Klaus Bogenberger, Carsten Markgraf

PDF

TL;DR

This paper introduces Cross-View Supervision (CVS), a novel training paradigm that transfers geometric priors from overhead views to improve ego-centric BEV representations for HD map construction without altering inference architecture.

Contribution

CVS aligns BEV representations from camera inputs with perspective-privileged overhead views, enhancing structural coherence and long-range accuracy in HD map tasks.

Findings

01

CVS improves mAP by 3.9 in standard regions and 9.9 in extended regions.

02

Maintains camera-only inference while leveraging overhead supervision.

03

Achieves 44% relative gain at long range in BEV map accuracy.

Abstract

Bird's-eye-view (BEV) representations derived from multi-camera input have become a central interface for online high-definition (HD) map construction. However, most approaches rely solely on ego-centric supervision, requiring large-scale scene structure to be inferred from incomplete observations, occlusions, and diminishing information density at long range, where perspective effects and spatial sparsity hinder consistent structural reasoning. We introduce Cross-View Supervision (CVS), a representation learning paradigm that transfers geometric and topological priors from an ego-aligned overhead perspective into camera-based BEV encoders. Rather than adding auxiliary semantic losses, CVS aligns representations in a shared BEV feature space and distills globally consistent structural knowledge from a perspective-privileged teacher into the ego-centric backbone. This supervision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.