Collaborative Perceiver: Elevating Vision-based 3D Object Detection via Local Density-Aware Spatial Occupancy
Jicheng Yuan, Manh Nguyen Duc, Qian Liu, Manfred Hauswirth, Danh Le Phuoc

TL;DR
The paper introduces Collaborative Perceiver, a multi-task learning framework that enhances vision-based 3D object detection by incorporating spatial occupancy information and local density features to improve environmental understanding.
Contribution
It proposes a novel pipeline for dense occupancy ground truth generation, a voxel-height-guided sampling strategy, and a global-local feature fusion module to improve BEV representations.
Findings
Achieves 49.5% mAP and 59.2% NDS on nuScenes benchmark.
Outperforms existing vision-based 3D detection methods.
Demonstrates the effectiveness of occupancy-aware multi-task learning.
Abstract
Vision-based bird's-eye-view (BEV) 3D object detection has advanced significantly in autonomous driving by offering cost-effectiveness and rich contextual information. However, existing methods often construct BEV representations by collapsing extracted object features, neglecting intrinsic environmental contexts, such as roads and pavements. This hinders detectors from comprehensively perceiving the characteristics of the physical world. To alleviate this, we introduce a multi-task learning framework, Collaborative Perceiver (CoP), that leverages spatial occupancy as auxiliary information to mine consistent structural and conceptual similarities shared between 3D object detection and occupancy prediction tasks, bridging gaps in spatial representations and feature refinement. To this end, we first propose a pipeline to generate dense occupancy ground truths incorporating local density…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Robotics and Sensor-Based Localization
