MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos
Arkaprava Sinha, Monish Soundar Raj, Pu Wang, Ahmed Helmy, Hieu Le, Srijan Das

TL;DR
MS-Temba introduces a multi-scale temporal modeling approach using dilated state-space models to improve action detection and summarization in long untrimmed videos, achieving state-of-the-art results with fewer parameters.
Contribution
The paper proposes MS-Temba, a novel multi-scale temporal Mamba architecture with dilated SSMs and a fusion mechanism, enhancing long-range and fine-grained temporal understanding in videos.
Findings
Achieves state-of-the-art on ADL benchmarks TSU & Charades.
Sets new records on video summarization datasets TVSum & SumMe.
Operates efficiently with only 17 million parameters.
Abstract
Temporal Action Detection (TAD) in untrimmed videos poses significant challenges, particularly for Activities of Daily Living (ADL) requiring models to (1) process long-duration videos, (2) capture temporal variations in actions, and (3) simultaneously detect dense overlapping actions. Existing CNN and Transformer-based approaches, struggle to jointly capture fine-grained detail and long-range structure at scale. State-space Model (SSM) based Mamba offers powerful long-range modeling, but naive application to TAD collapses fine-grained temporal structure and fails to account for the challenges inherent to TAD. To this end, we propose Multi-Scale Temporal Mamba (MS-Temba), which extends Mamba to TAD with newly introduced dilated SSMs. Each Temba block, comprising dilated SSMs coupled with our proposed additional losses, enables the learning of discriminative representations across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Anomaly Detection Techniques and Applications
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
