Temporal Action Localization with Cross Layer Task Decoupling and Refinement
Qiang Li, Di Liu, Jun Kong, Sen Li, Hui Xu, Jianzhong Wang

TL;DR
This paper introduces a novel approach for temporal action localization that effectively disentangles classification and localization tasks using cross layer feature decoupling and refinement, leading to state-of-the-art results.
Contribution
The proposed CLTDR method integrates multi-layer features for better task decoupling and refinement, and introduces the lightweight GMG module for multi-granularity feature extraction.
Findings
Achieves state-of-the-art performance on five benchmarks.
Effectively disentangles classification and localization tasks.
Improves feature utilization with the GMG module.
Abstract
Temporal action localization (TAL) involves dual tasks to classify and localize actions within untrimmed videos. However, the two tasks often have conflicting requirements for features. Existing methods typically employ separate heads for classification and localization tasks but share the same input feature, leading to suboptimal performance. To address this issue, we propose a novel TAL method with Cross Layer Task Decoupling and Refinement (CLTDR). Based on the feature pyramid of video, CLTDR strategy integrates semantically strong features from higher pyramid layers and detailed boundary-aware boundary features from lower pyramid layers to effectively disentangle the action classification and localization tasks. Moreover, the multiple features from cross layers are also employed to refine and align the disentangled classification and regression results. At last, a lightweight Gated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsALIGN
