VOIC: Visible-Occluded Integrated Guidance for 3D Semantic Scene Completion
Zaidao Han, Risa Higashita, Jiang Liu

TL;DR
VOIC introduces a dual-decoder framework with explicit visible-occluded separation, significantly improving 3D semantic scene completion from a single image for autonomous driving.
Contribution
It proposes a novel offline visible region label extraction strategy and a dual-decoder network to enhance perception and reasoning in 3D scene completion.
Findings
Outperforms existing methods on SemanticKITTI and KITTI360 benchmarks.
Achieves state-of-the-art accuracy in geometric and semantic scene completion.
Effectively separates visible perception from occluded reasoning for better results.
Abstract
Camera-based 3D Semantic Scene Completion (SSC) is a critical task for autonomous driving and robotic scene understanding. It aims to infer a complete 3D volumetric representation of both semantics and geometry from a single image. Existing methods typically focus on end-to-end 2D-to-3D feature lifting and voxel completion. However, they often overlook the interference between high-confidence visible-region perception and low-confidence occluded-region reasoning caused by single-image input, which can lead to feature dilution and error propagation. To address these challenges, we introduce an offline Visible Region Label Extraction (VRLE) strategy that explicitly separates and extracts voxel-level supervision for visible regions from dense 3D ground truth. This strategy purifies the supervisory space for two complementary sub-tasks: visible-region perception and occluded-region…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
