MambaTron: Efficient Cross-Modal Point Cloud Enhancement using Aggregate   Selective State Space Modeling

Sai Tarun Inaganti; Gennady Petrenko

arXiv:2501.16384·cs.CV·March 21, 2025

MambaTron: Efficient Cross-Modal Point Cloud Enhancement using Aggregate Selective State Space Modeling

Sai Tarun Inaganti, Gennady Petrenko

PDF

Open Access

TL;DR

MambaTron introduces an efficient Mamba-Transformer based method for cross-modal point cloud enhancement, leveraging long-sequence processing to achieve competitive performance with reduced computational costs.

Contribution

This work pioneers the application of Mamba-based cross-attention in multi-modal 3D vision tasks, specifically for view-guided point cloud completion.

Findings

01

Achieves performance comparable to state-of-the-art methods.

02

Uses significantly fewer computational resources.

03

Demonstrates effective cross-modal reconstruction capabilities.

Abstract

Point cloud enhancement is the process of generating a high-quality point cloud from an incomplete input. This is done by filling in the missing details from a reference like the ground truth via regression, for example. In addition to unimodal image and point cloud reconstruction, we focus on the task of view-guided point cloud completion, where we gather the missing information from an image, which represents a view of the point cloud and use it to generate the output point cloud. With the recent research efforts surrounding state-space models, originally in natural language processing and now in 2D and 3D vision, Mamba has shown promising results as an efficient alternative to the self-attention mechanism. However, there is limited research towards employing Mamba for cross-attention between the image and the input point cloud, which is crucial in multi-modal problems. In this paper,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · 3D Modeling in Geospatial Applications · Computer Graphics and Visualization Techniques

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Focus