TL;DR
STS-Mixer introduces a spectral domain approach to 4D point cloud video understanding, decomposing signals into frequency bands to better capture geometric and dynamic scene information.
Contribution
It proposes a novel spectral analysis framework and a unified spatio-temporal-spectral mixer for improved 4D point cloud video understanding.
Findings
Achieves superior performance on 3D action recognition benchmarks.
Outperforms existing methods in 4D semantic segmentation.
Effectively captures geometric details through spectral decomposition.
Abstract
4D point cloud videos capture rich spatial and temporal dynamics of scenes which possess unique values in various 4D understanding tasks. However, most existing methods work in the spatiotemporal domain where the underlying geometric characteristics of 4D point cloud videos are hard to capture, leading to degraded representation learning and understanding of 4D point cloud videos. We address the above challenge from a complementary spectral perspective. By transforming 4D point cloud videos into graph spectral signals, we can decompose them into multiple frequency bands each of which captures distinct geometric structures of point cloud videos. Our spectral analysis reveals that the decomposed low-frequency signals capture more coarse shapes while high-frequency signals encode more fine-grained geometry details. Building on these observations, we design Spatio-Temporal-Spectral Mixer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
