Prediction-Feedback DETR for Temporal Action Detection
Jihwan Kim, Miso Lee, Cheol-Ho Cho, Jihyun Lee, Jae-Pil Heo

TL;DR
This paper introduces Prediction-Feedback DETR (Pred-DETR), a novel framework that addresses attention collapse in DETR-based temporal action detection, significantly improving performance on multiple benchmarks.
Contribution
It proposes a new prediction-feedback mechanism to mitigate attention collapse in DETR for TAD, aligning cross- and self-attention with predictions for better accuracy.
Findings
Achieves state-of-the-art results on THUMOS14, ActivityNet-v1.3, HACS, and FineAction.
Effectively alleviates attention collapse in DETR-based TAD methods.
Demonstrates the importance of prediction-guided feedback in transformer models.
Abstract
Temporal Action Detection (TAD) is fundamental yet challenging for real-world video applications. Leveraging the unique benefits of transformers, various DETR-based approaches have been adopted in TAD. However, it has recently been identified that the attention collapse in self-attention causes the performance degradation of DETR for TAD. Building upon previous research, this paper newly addresses the attention collapse problem in cross-attention within DETR-based TAD methods. Moreover, our findings reveal that cross-attention exhibits patterns distinct from predictions, indicating a short-cut phenomenon. To resolve this, we propose a new framework, Prediction-Feedback DETR (Pred-DETR), which utilizes predictions to restore the collapse and align the cross- and self-attention with predictions. Specifically, we devise novel prediction-feedback objectives using guidance from the relations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsAttention Is All You Need · Linear Layer · Adam · Layer Normalization · Feedforward Network · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Convolution
