Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles
Anupkumar Bochare

TL;DR
This paper introduces a novel camera-only perception system for autonomous vehicles that generates Bird's Eye View maps by integrating object detection and monocular depth estimation, eliminating the need for expensive LiDAR sensors.
Contribution
It extends the Lift-Splat-Shoot architecture to combine YOLOv11 and DepthAnythingV2 for comprehensive scene understanding using only cameras.
Findings
Achieves up to 85% road segmentation accuracy
Detects vehicles with 85-90% accuracy
Maintains average positional errors around 1.2 meters
Abstract
Autonomous vehicle perception systems have traditionally relied on costly LiDAR sensors to generate precise environmental representations. In this paper, we propose a camera-only perception framework that produces Bird's Eye View (BEV) maps by extending the Lift-Splat-Shoot architecture. Our method combines YOLOv11-based object detection with DepthAnythingV2 monocular depth estimation across multi-camera inputs to achieve comprehensive 360-degree scene understanding. We evaluate our approach on the OpenLane-V2 and NuScenes datasets, achieving up to 85% road segmentation accuracy and 85-90% vehicle detection rates when compared against LiDAR ground truth, with average positional errors limited to 1.2 meters. These results highlight the potential of deep learning to extract rich spatial information using only camera inputs, enabling cost-efficient autonomous navigation without sacrificing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Robotics and Sensor-Based Localization
