MambaFlow: A Novel and Flow-guided State Space Model for Scene Flow Estimation
Jiehao Luo, Jintao Cheng, Xiaoyu Tang, Qingwen Zhang, Bohuan Xue, Rui, Fan

TL;DR
MambaFlow introduces a flow-guided state space model with a mamba-based decoder for accurate, real-time scene flow estimation from point clouds, addressing previous challenges in spatio-temporal modeling and feature preservation.
Contribution
The paper proposes MambaFlow, a novel scene flow network utilizing a mamba-based decoder and a scene-adaptive loss for improved accuracy and generalization in urban scenarios.
Findings
Achieves state-of-the-art performance on Argoverse 2 benchmark
Enables real-time scene flow estimation in complex urban environments
Demonstrates superior accuracy over existing methods
Abstract
Scene flow estimation aims to predict 3D motion from consecutive point cloud frames, which is of great interest in autonomous driving field. Existing methods face challenges such as insufficient spatio-temporal modeling and inherent loss of fine-grained feature during voxelization. However, the success of Mamba, a representative state space model (SSM) that enables global modeling with linear complexity, provides a promising solution. In this paper, we propose MambaFlow, a novel scene flow estimation network with a mamba-based decoder. It enables deep interaction and coupling of spatio-temporal features using a well-designed backbone. Innovatively, we steer the global attention modeling of voxel-based features with point offset information using an efficient Mamba-based decoder, learning voxel-to-point patterns that are used to devoxelize shared voxel representations into point-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
