UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling
Peiming Li, Ziyi Wang, Yulin Yuan, Hong Liu, Xiangming Meng, Junsong Yuan, Mengyuan Liu

TL;DR
The paper introduces UST-SSM, a novel model for point cloud video analysis that reorganizes unordered points into semantic sequences and enhances spatio-temporal feature aggregation, improving action recognition accuracy.
Contribution
The paper proposes UST-SSM, extending state space models with spatial-temporal selection, structure aggregation, and interaction sampling for better point cloud video modeling.
Findings
Effective on multiple datasets including MSR-Action3D, NTU RGB+D, and Synthia 4D.
Improves spatio-temporal feature utilization and action recognition accuracy.
Code available at provided GitHub link.
Abstract
Point cloud videos capture dynamic 3D motion while reducing the effects of lighting and viewpoint variations, making them highly effective for recognizing subtle and continuous human actions. Although Selective State Space Models (SSMs) have shown good performance in sequence modeling with linear complexity, the spatio-temporal disorder of point cloud videos hinders their unidirectional modeling when directly unfolding the point cloud video into a 1D sequence through temporally sequential scanning. To address this challenge, we propose the Unified Spatio-Temporal State Space Model (UST-SSM), which extends the latest advancements in SSMs to point cloud videos. Specifically, we introduce Spatial-Temporal Selection Scanning (STSS), which reorganizes unordered points into semantic-aware sequences through prompt-guided clustering, thereby enabling the effective utilization of points that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · 3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage
