ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
Jing Liang, He Yin, Xuewei Qi, Jong Jin Park, Min Sun, Rajasimman, Madhivanan, Dinesh Manocha

TL;DR
ET-Former is an end-to-end method that uses a novel triplane deformable attention mechanism and CVAE to improve 3D semantic scene completion from monocular images, achieving state-of-the-art accuracy with low memory use.
Contribution
The paper introduces a triplane deformable attention mechanism and uncertainty estimation via CVAE for monocular 3D scene completion, advancing geometric understanding and efficiency.
Findings
Achieves highest IoU and mIoU scores on Semantic-KITTI dataset.
Reduces GPU memory usage compared to previous methods.
Improves SOTA IoU from 44.71 to 51.49.
Abstract
We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera. Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions. By designing a triplane-based deformable attention mechanism, our approach improves geometric understanding of the scene than other SOTA approaches and reduces noise in semantic predictions. Additionally, through the use of a Conditional Variational AutoEncoder (CVAE), we estimate the uncertainties of these predictions. The generated semantic and uncertainty maps will help formulate navigation strategies that facilitate safe and permissible decision making in the future. Evaluated on the Semantic-KITTI dataset, ET-Former achieves the highest Intersection over Union (IoU) and mean IoU (mIoU) scores while maintaining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Attention Is All You Need
