MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
Jiuming Liu, Jinru Han, Lihao Liu, Angelica I. Aviles-Rivero, Chaokang, Jiang, Zhe Liu, Hesheng Wang

TL;DR
Mamba4D introduces a novel state space model-based backbone for efficient long-sequence point cloud video understanding, disentangling spatial and temporal features to improve accuracy and computational efficiency.
Contribution
The paper proposes a new point cloud video backbone using disentangled spatial-temporal state space models with Mamba blocks, reducing complexity and enhancing performance.
Findings
+10.4% accuracy on MSR-Action3D
0.7 F1 Score improvement on HOI4D
87.5% GPU memory reduction and 5.36x speed-up for long videos
Abstract
Point cloud videos can faithfully capture real-world spatial geometries and temporal dynamics, which are essential for enabling intelligent agents to understand the dynamically changing world. However, designing an effective 4D backbone remains challenging, mainly due to the irregular and unordered distribution of points and temporal inconsistencies across frames. Also, recent transformer-based 4D backbones commonly suffer from large computational costs due to their quadratic complexity, particularly for long video sequences. To address these challenges, we propose a novel point cloud video understanding backbone purely based on the State Space Models (SSMs). Specifically, we first disentangle space and time in 4D video sequences and then establish the spatio-temporal correlation with our designed Mamba blocks. The Intra-frame Spatial Mamba module is developed to encode locally similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Remote Sensing and LiDAR Applications · Computer Graphics and Visualization Techniques
