Self-Feedback DETR for Temporal Action Detection
Jihwan Kim, Miso Lee, Jae-Pil Heo

TL;DR
This paper introduces Self-DETR, a novel framework that addresses the temporal collapse problem in DETR-based models for temporal action detection by reactivating self-attention modules through cross-attention guidance, leading to improved attention diversity.
Contribution
The paper proposes a new method to mitigate the temporal collapse in DETR models for TAD by reactivating self-attention using cross-attention maps, enhancing model performance.
Findings
Self-DETR effectively resolves the temporal collapse problem.
The approach maintains high diversity of attention across layers.
Experiments show improved detection accuracy.
Abstract
Temporal Action Detection (TAD) is challenging but fundamental for real-world video applications. Recently, DETR-based models have been devised for TAD but have not performed well yet. In this paper, we point out the problem in the self-attention of DETR for TAD; the attention modules focus on a few key elements, called temporal collapse problem. It degrades the capability of the encoder and decoder since their self-attention modules play no role. To solve the problem, we propose a novel framework, Self-DETR, which utilizes cross-attention maps of the decoder to reactivate self-attention modules. We recover the relationship between encoder features by simple matrix multiplication of the cross-attention map and its transpose. Likewise, we also get the information within decoder queries. By guiding collapsed self-attention maps with the guidance map calculated, we settle down the temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Self-Feedback DETR for Temporal Action Detection· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Feedforward Network · Softmax
