Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism
Sangyoun Lee, Juho Jung, Changdae Oh, Sunghee Yun

TL;DR
This paper introduces an advanced S6-based architecture with recurrent mechanisms for temporal action localization, significantly improving accuracy by better modeling long-range dependencies and temporal causality in videos.
Contribution
It proposes a novel TAL model integrating Feature Aggregated Bi-S6, Dual Bi-S6, and recurrent mechanisms, achieving state-of-the-art results without increasing parameter complexity.
Findings
Achieved state-of-the-art mAP scores on multiple datasets.
Validated the effectiveness of the Dual structure and recurrent mechanisms.
Demonstrated superior long-range dependency modeling.
Abstract
Temporal Action Localization (TAL) is a critical task in video analysis, identifying precise start and end times of actions. Existing methods like CNNs, RNNs, GCNs, and Transformers have limitations in capturing long-range dependencies and temporal causality. To address these challenges, we propose a novel TAL architecture leveraging the Selective State Space Model (S6). Our approach integrates the Feature Aggregated Bi-S6 block, Dual Bi-S6 structure, and a recurrent mechanism to enhance temporal and channel-wise dependency modeling without increasing parameter complexity. Extensive experiments on benchmark datasets demonstrate state-of-the-art results with mAP scores of 74.2% on THUMOS-14, 42.9% on ActivityNet, 29.6% on FineAction, and 45.8% on HACS. Ablation studies validate our method's effectiveness, showing that the Dual structure in the Stem module and the recurrent mechanism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Data Visualization and Analytics
