Self-Feedback DETR for Temporal Action Detection

Jihwan Kim; Miso Lee; Jae-Pil Heo

arXiv:2308.10570·cs.CV·August 22, 2023·1 cites

Self-Feedback DETR for Temporal Action Detection

Jihwan Kim, Miso Lee, Jae-Pil Heo

PDF

Open Access 1 Video

TL;DR

This paper introduces Self-DETR, a novel framework that addresses the temporal collapse problem in DETR-based models for temporal action detection by reactivating self-attention modules through cross-attention guidance, leading to improved attention diversity.

Contribution

The paper proposes a new method to mitigate the temporal collapse in DETR models for TAD by reactivating self-attention using cross-attention maps, enhancing model performance.

Findings

01

Self-DETR effectively resolves the temporal collapse problem.

02

The approach maintains high diversity of attention across layers.

03

Experiments show improved detection accuracy.

Abstract

Temporal Action Detection (TAD) is challenging but fundamental for real-world video applications. Recently, DETR-based models have been devised for TAD but have not performed well yet. In this paper, we point out the problem in the self-attention of DETR for TAD; the attention modules focus on a few key elements, called temporal collapse problem. It degrades the capability of the encoder and decoder since their self-attention modules play no role. To solve the problem, we propose a novel framework, Self-DETR, which utilizes cross-attention maps of the decoder to reactivate self-attention modules. We recover the relationship between encoder features by simple matrix multiplication of the cross-attention map and its transpose. Likewise, we also get the information within decoder queries. By guiding collapsed self-attention maps with the guidance map calculated, we settle down the temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Self-Feedback DETR for Temporal Action Detection· youtube

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Feedforward Network · Softmax