PAT: Position-Aware Transformer for Dense Multi-Label Action Detection
Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, and Adrian, Hilton

TL;DR
PAT introduces a position-aware transformer that effectively models complex temporal action dependencies in videos by embedding relative positional encoding and utilizing a non-hierarchical multi-scale approach, leading to state-of-the-art results.
Contribution
The paper proposes a novel non-hierarchical transformer architecture with relative positional encoding for dense multi-label action detection in videos, addressing positional information loss in existing methods.
Findings
Achieves new state-of-the-art mAP on Charades and MultiTHUMOS datasets.
Improves previous results by 1.1% and 0.6% mAP respectively.
Extensive ablation studies validate the effectiveness of each component.
Abstract
We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video by exploiting multi-scale temporal features. In existing methods, the self-attention mechanism in transformers loses the temporal positional information, which is essential for robust action detection. To address this issue, we (i) embed relative positional encoding in the self-attention mechanism and (ii) exploit multi-scale temporal relationships by designing a novel non hierarchical network, in contrast to the recent transformer-based approaches that use a hierarchical structure. We argue that joining the self-attention mechanism with multiple sub-sampling processes in the hierarchical approaches results in increased loss of positional information. We evaluate the performance of our proposed approach on two challenging dense multi-label benchmark datasets, and show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsNetwork On Network
