DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection
Zhe Huang, Yizhe Zhao, Hao Xiao, Chenyan Wu, Lingting Ge

TL;DR
DuoSpaceNet introduces a unified framework combining bird's-eye-view and perspective-view features for improved 3D object detection from multi-view camera data, outperforming existing methods.
Contribution
It fully integrates BEV and PV features within a single detection pipeline, including a novel decoder and feature enhancement strategies, advancing multi-view 3D perception.
Findings
Outperforms BEV and PV baselines on nuScenes
Enhances 3D detection accuracy
Improves BEV map segmentation
Abstract
Multi-view camera-only 3D object detection largely follows two primary paradigms: exploiting bird's-eye-view (BEV) representations or focusing on perspective-view (PV) features, each with distinct advantages. Although several recent approaches explore combining BEV and PV, many rely on partial fusion or maintain separate detection heads. In this paper, we propose DuoSpaceNet, a novel framework that fully unifies BEV and PV feature spaces within a single detection pipeline for comprehensive 3D perception. Our design includes a decoder to integrate BEV and PV features into unified detection queries, as well as a feature enhancement strategy that enriches different feature representations. In addition, DuoSpaceNet can be extended to handle multi-frame inputs, enabling more robust temporal analysis. Extensive experiments on nuScenes dataset show that DuoSpaceNet surpasses both BEV-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
