Bridging Stereo Geometry and BEV Representation with Reliable Mutual   Interaction for Semantic Scene Completion

Bohan Li; Yasheng Sun; Zhujin Liang; Dalong Du; Zhuanghui Zhang,; Xiaofeng Wang; Yunnan Wang; Xin Jin; Wenjun Zeng

arXiv:2303.13959·cs.CV·May 7, 2024·6 cites

Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion

Bohan Li, Yasheng Sun, Zhujin Liang, Dalong Du, Zhuanghui Zhang,, Xiaofeng Wang, Yunnan Wang, Xin Jin, Wenjun Zeng

PDF

Open Access 1 Repo

TL;DR

This paper introduces BRGScene, a unified framework that effectively combines stereo geometry and BEV representations through mutual interaction modules to improve semantic scene completion from limited observations.

Contribution

The paper proposes a novel mutual interactive ensemble framework that bridges stereo geometry and BEV features for dense 3D scene prediction in SSC.

Findings

01

Outperforms existing camera-based SSC methods on SemanticKITTI.

02

Effective fusion of stereo geometry and BEV enhances scene completion accuracy.

03

Mutual guidance and ensemble modules improve feature interaction and prediction quality.

Abstract

3D semantic scene completion (SSC) is an ill-posed perception task that requires inferring a dense 3D scene from limited observations. Previous camera-based methods struggle to predict accurate semantic scenes due to inherent geometric ambiguity and incomplete observations. In this paper, we resort to stereo matching technique and bird's-eye-view (BEV) representation learning to address such issues in SSC. Complementary to each other, stereo matching mitigates geometric ambiguity with epipolar constraint while BEV representation enhances the hallucination ability for invisible regions with global semantic context. However, due to the inherent representation gap between stereo geometry and BEV features, it is non-trivial to bridge them for dense prediction task of SSC. Therefore, we further develop a unified occupancy-based framework dubbed BRGScene, which effectively bridges these two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Arlo0o/StereoScene
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Advanced Image Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Absolute Position Encodings · Softmax · Residual Connection · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer