TACOcc:Target-Adaptive Cross-Modal Fusion with Volume Rendering for 3D Semantic Occupancy
Luyao Lei, Shuo Xu, Yifan Bai, Xing Wei

TL;DR
TACOcc introduces an adaptive cross-modal fusion method with volume rendering supervision to improve 3D semantic occupancy prediction, addressing geometry-semantics mismatch and surface detail loss in multi-modal data.
Contribution
The paper proposes a novel target-scale adaptive bidirectional retrieval mechanism and an improved volume rendering pipeline for enhanced multi-modal 3D occupancy prediction.
Findings
Outperforms existing methods on nuScenes and SemanticKITTI benchmarks.
Effectively aligns features across modalities with adaptive neighborhood expansion and shrinking.
Enhances surface detail reconstruction via volume rendering supervision.
Abstract
The performance of multi-modal 3D occupancy prediction is limited by ineffective fusion, mainly due to geometry-semantics mismatch from fixed fusion strategies and surface detail loss caused by sparse, noisy annotations. The mismatch stems from the heterogeneous scale and distribution of point cloud and image features, leading to biased matching under fixed neighborhood fusion. To address this, we propose a target-scale adaptive, bidirectional symmetric retrieval mechanism. It expands the neighborhood for large targets to enhance context awareness and shrinks it for small ones to improve efficiency and suppress noise, enabling accurate cross-modal feature alignment. This mechanism explicitly establishes spatial correspondences and improves fusion accuracy. For surface detail loss, sparse labels provide limited supervision, resulting in poor predictions for small objects. We introduce an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
