TL;DR
This paper introduces a novel diffusion-based neural model called Skip Mamba Diffusion for monocular 3D semantic scene completion, achieving state-of-the-art results by processing data in a latent space with an innovative denoiser.
Contribution
It presents the Skimba denoiser and a diffusion model in the state space for efficient 3D scene completion from monocular images, outperforming existing monocular methods.
Findings
Outperforms other monocular techniques significantly
Achieves competitive results with stereo methods
Demonstrates effectiveness on SemanticKITTI and KITTI360 datasets
Abstract
3D semantic scene completion is critical for multiple downstream tasks in autonomous systems. It estimates missing geometric and semantic information in the acquired scene data. Due to the challenging real-world conditions, this task usually demands complex models that process multi-modal data to achieve acceptable performance. We propose a unique neural model, leveraging advances from the state space and diffusion generative modeling to achieve remarkable 3D semantic scene completion performance with monocular image input. Our technique processes the data in the conditioned latent space of a variational autoencoder where diffusion modeling is carried out with an innovative state space technique. A key component of our neural network is the proposed Skimba (Skip Mamba) denoiser, which is adept at efficiently processing long-sequence data. The Skimba diffusion model is integral to our 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsADaptive gradient method with the OPTimal convergence rate · Diffusion · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
