Robust Bird's Eye View Segmentation by Adapting DINOv2
Merve Rabia Bar{\i}n, G\"orkay Aydemir, Fatma G\"uney

TL;DR
This paper enhances Bird's Eye View perception in autonomous driving by adapting DINOv2 with Low Rank Adaptation, significantly improving robustness against corruptions like weather and camera failures, while also reducing training complexity.
Contribution
It introduces a novel adaptation of DINOv2 for BEV estimation using LoRA, improving robustness and efficiency in BEV perception tasks.
Findings
Increased robustness under various corruptions
Faster convergence during training
Fewer learnable parameters needed
Abstract
Extracting a Bird's Eye View (BEV) representation from multiple camera images offers a cost-effective, scalable alternative to LIDAR-based solutions in autonomous driving. However, the performance of the existing BEV methods drops significantly under various corruptions such as brightness and weather changes or camera failures. To improve the robustness of BEV perception, we propose to adapt a large vision foundational model, DINOv2, to BEV estimation using Low Rank Adaptation (LoRA). Our approach builds on the strong representation space of DINOv2 by adapting it to the BEV task in a state-of-the-art framework, SimpleBEV. Our experiments show increased robustness of BEV perception under various corruptions, with increasing gains from scaling up the model and the input resolution. We also showcase the effectiveness of the adapted representations in terms of fewer learnable parameters and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications
