STMixer: A One-Stage Sparse Action Detector
Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

TL;DR
STMixer is an innovative one-stage sparse action detector that uses adaptive feature sampling and dual-branch feature mixing to efficiently and accurately detect actions in videos, outperforming existing methods.
Contribution
The paper introduces STMixer, a novel end-to-end action detection framework with adaptive feature sampling and dynamic feature mixing, improving performance and convergence speed.
Findings
Achieves state-of-the-art results on AVA, UCF101-24, and JHMDB datasets.
Demonstrates superior performance over traditional two-stage detectors.
Provides an efficient, end-to-end solution for video action detection.
Abstract
Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and cannot capture context information outside the bounding box. Recently, a few query-based action detectors are proposed to predict action instances in an end-to-end manner. However, they still lack adaptability in feature sampling and decoding, thus suffering from the issues of inferior performance or slower convergence. In this paper, we propose a new one-stage sparse action detector, termed STMixer. STMixer is based on two core designs. First, we present a query-based adaptive feature sampling module, which endows our STMixer with the flexibility of mining a set of discriminative features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
MethodsRoIAlign
