Classification Matters: Improving Video Action Detection with   Class-Specific Attention

Jinsung Lee; Taeoh Kim; Inwoong Lee; Minho Shim; Dongyoon Wee; Minsu; Cho; Suha Kwak

arXiv:2407.19698·cs.CV·September 12, 2024

Classification Matters: Improving Video Action Detection with Class-Specific Attention

Jinsung Lee, Taeoh Kim, Inwoong Lee, Minho Shim, Dongyoon Wee, Minsu, Cho, Suha Kwak

PDF

Open Access

TL;DR

This paper introduces a class-specific attention mechanism for video action detection that emphasizes relevant context over actor regions, leading to improved accuracy and efficiency.

Contribution

It proposes a novel class-dedicated query approach that dynamically focuses on context, enhancing classification performance in video action detection.

Findings

01

Outperforms existing methods on three benchmarks

02

Uses fewer parameters and less computation

03

Effectively emphasizes context for better classification

Abstract

Video action detection (VAD) aims to detect actors and classify their actions in a video. We figure that VAD suffers more from classification rather than localization of actors. Hence, we analyze how prevailing methods form features for classification and find that they prioritize actor regions, yet often overlooking the essential contextual information necessary for accurate classification. Accordingly, we propose to reduce the bias toward actor and encourage paying attention to the context that is relevant to each action class. By assigning a class-dedicated query to each action class, our model can dynamically determine where to focus for effective classification. The proposed model demonstrates superior performance on three challenging benchmarks with significantly fewer parameters and less computation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition

MethodsSoftmax · Attention Is All You Need · Focus