Structured Attention Composition for Temporal Action Localization
Le Yang, Junwei Han, Tao Zhao, Nian Liu, Dingwen Zhang

TL;DR
This paper introduces a structured attention composition module that enhances multi-modality feature learning in temporal action localization, leading to improved accuracy by modeling the relationship between appearance and motion features.
Contribution
The paper proposes a novel structured attention composition module that encodes frame-modality relationships using optimal transport, improving existing action localization frameworks.
Findings
Consistently improves four state-of-the-art methods
Achieves new state-of-the-art on THUMOS14
Demonstrates effectiveness of structured attention in multi-modality learning
Abstract
Temporal action localization aims at localizing action instances from untrimmed videos. Existing works have designed various effective modules to precisely localize action instances based on appearance and motion features. However, by treating these two kinds of features with equal importance, previous works cannot take full advantage of each modality feature, making the learned model still sub-optimal. To tackle this issue, we make an early effort to study temporal action localization from the perspective of multi-modality feature learning, based on the observation that different actions exhibit specific preferences to appearance or motion modality. Specifically, we build a novel structured attention composition module. Unlike conventional attention, the proposed module would not infer frame attention and modality attention independently. Instead, by casting the relationship between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
