Actor-identified Spatiotemporal Action Detection -- Detecting Who Is Doing What in Videos
Fan Yang, Norimichi Ukita, Sakriani Sakti, Satoshi Nakamura

TL;DR
This paper introduces Actor-identified Spatiotemporal Action Detection (ASAD), a new task that localizes actions in space and time while identifying the actor, supported by a new dataset, evaluation metrics, and improved tracking methods.
Contribution
The paper proposes the novel ASAD task, creates a dedicated dataset, develops evaluation metrics, and enhances MOT strategies to enable actor identification in spatiotemporal action detection.
Findings
Created the first ASAD dataset with actor IDs and action annotations.
Proposed new evaluation metrics for multi-label actions and actor identification.
Improved MOT data association strategies lead to better ASAD performance.
Abstract
The success of deep learning on video Action Recognition (AR) has motivated researchers to progressively promote related tasks from the coarse level to the fine-grained level. Compared with conventional AR which only predicts an action label for the entire video, Temporal Action Detection (TAD) has been investigated for estimating the start and end time for each action in videos. Taking TAD a step further, Spatiotemporal Action Detection (SAD) has been studied for localizing the action both spatially and temporally in videos. However, who performs the action, is generally ignored in SAD, while identifying the actor could also be important. To this end, we propose a novel task, Actor-identified Spatiotemporal Action Detection (ASAD), to bridge the gap between SAD and actor identification. In ASAD, we not only detect the spatiotemporal boundary for instance-level action but also assign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
