Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion
Bohan Li, Yasheng Sun, Zhujin Liang, Dalong Du, Zhuanghui Zhang,, Xiaofeng Wang, Yunnan Wang, Xin Jin, Wenjun Zeng

TL;DR
This paper introduces BRGScene, a unified framework that effectively combines stereo geometry and BEV representations through mutual interaction modules to improve semantic scene completion from limited observations.
Contribution
The paper proposes a novel mutual interactive ensemble framework that bridges stereo geometry and BEV features for dense 3D scene prediction in SSC.
Findings
Outperforms existing camera-based SSC methods on SemanticKITTI.
Effective fusion of stereo geometry and BEV enhances scene completion accuracy.
Mutual guidance and ensemble modules improve feature interaction and prediction quality.
Abstract
3D semantic scene completion (SSC) is an ill-posed perception task that requires inferring a dense 3D scene from limited observations. Previous camera-based methods struggle to predict accurate semantic scenes due to inherent geometric ambiguity and incomplete observations. In this paper, we resort to stereo matching technique and bird's-eye-view (BEV) representation learning to address such issues in SSC. Complementary to each other, stereo matching mitigates geometric ambiguity with epipolar constraint while BEV representation enhances the hallucination ability for invisible regions with global semantic context. However, due to the inherent representation gap between stereo geometry and BEV features, it is non-trivial to bridge them for dense prediction task of SSC. Therefore, we further develop a unified occupancy-based framework dubbed BRGScene, which effectively bridges these two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Advanced Image Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Absolute Position Encodings · Softmax · Residual Connection · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer
