Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization
Humam Alwassel, Fabian Caba Heilbron, Bernard Ghanem

TL;DR
This paper introduces Action Search, a neural network model inspired by human action spotting behavior, which efficiently locates actions in videos by observing only a small portion, and provides a new dataset of human search sequences.
Contribution
It proposes a novel action spotting problem, a human-inspired RNN approach, and the Human Searches dataset for training and evaluation.
Findings
Explores only 17.3% of videos on average
Achieves 30.8% mAP in temporal action localization
Demonstrates efficient and accurate action spotting
Abstract
State-of-the-art temporal action detectors inefficiently search the entire video for specific actions. Despite the encouraging progress these methods achieve, it is crucial to design automated approaches that only explore parts of the video which are the most relevant to the actions being searched for. To address this need, we propose the new problem of action spotting in video, which we define as finding a specific action in a video while observing a small portion of that video. Inspired by the observation that humans are extremely efficient and accurate in spotting and finding action instances in video, we propose Action Search, a novel Recurrent Neural Network approach that mimics the way humans spot actions. Moreover, to address the absence of data recording the behavior of human annotators, we put forward the Human Searches dataset, which compiles the search sequences employed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization
