TL;DR
This paper introduces DualViewMapDet, a camera-only 3D detection and tracking framework that uses prior static point cloud maps to improve localization accuracy in autonomous driving.
Contribution
It proposes a dual-space fusion strategy that integrates map priors into camera-based detection without relying on depth sensors, enhancing localization performance.
Findings
Consistent improvements over camera-only baselines on nuScenes and Argoverse 2.
Significant gains in object localization accuracy.
Ablation studies validate the effectiveness of PV/BEV fusion and map coverage.
Abstract
Camera-based 3D object detection and tracking are central to autonomous driving, yet precise 3D object localization remains fundamentally constrained by depth ambiguity when no expensive, depth-rich online LiDAR is available at inference. In many deployments, however, vehicles repeatedly traverse the same environments, making static point cloud maps from prior traversals a practical source of geometric priors. We propose DualViewMapDet, a camera-only inference framework that retrieves such map priors online and leverages them to mitigate the absence of a LiDAR sensor during deployment. The key idea is a dual-space camera-map fusion strategy that avoids one-sided view conversion. Specifically, we (i) project the map into perspective view (PV) and encode multi-channel geometric cues to enrich image features and support BEV lifting, and (ii) encode the map directly in bird's-eye view (BEV)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
