BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning
Ziyang Leng, Jiawei Yang, Zhicheng Ren, Bolei Zhou

TL;DR
BEVCon introduces a contrastive learning framework that significantly enhances Bird's Eye View perception in autonomous driving, leading to better 3D detection, segmentation, and trajectory prediction performance.
Contribution
The paper proposes a novel contrastive learning approach for BEV perception, focusing on representation learning to improve feature quality in BEV models.
Findings
Achieves up to +2.4% mAP improvement on nuScenes dataset.
Enhances BEV features and backbone representations through contrastive modules.
Demonstrates the importance of representation learning in BEV perception.
Abstract
We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
