STMixer: A One-Stage Sparse Action Detector

Tao Wu; Mengqi Cao; Ziteng Gao; Gangshan Wu; Limin Wang

arXiv:2303.15879·cs.CV·March 29, 2023·1 cites

STMixer: A One-Stage Sparse Action Detector

Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

PDF

Open Access

TL;DR

STMixer is an innovative one-stage sparse action detector that uses adaptive feature sampling and dual-branch feature mixing to efficiently and accurately detect actions in videos, outperforming existing methods.

Contribution

The paper introduces STMixer, a novel end-to-end action detection framework with adaptive feature sampling and dynamic feature mixing, improving performance and convergence speed.

Findings

01

Achieves state-of-the-art results on AVA, UCF101-24, and JHMDB datasets.

02

Demonstrates superior performance over traditional two-stage detectors.

03

Provides an efficient, end-to-end solution for video action detection.

Abstract

Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and cannot capture context information outside the bounding box. Recently, a few query-based action detectors are proposed to predict action instances in an end-to-end manner. However, they still lack adaptability in feature sampling and decoding, thus suffering from the issues of inferior performance or slower convergence. In this paper, we propose a new one-stage sparse action detector, termed STMixer. STMixer is based on two core designs. First, we present a query-based adaptive feature sampling module, which endows our STMixer with the flexibility of mining a set of discriminative features from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications

MethodsRoIAlign