Human Action Localization with Sparse Spatial Supervision
Philippe Weinzaepfel, Xavier Martin, Cordelia Schmid

TL;DR
This paper presents a novel approach for human action localization that uses sparse spatial supervision, leveraging high-quality human tubes and minimal annotations to achieve performance comparable to fully supervised methods.
Contribution
The method introduces a way to perform spatio-temporal action localization with only sparse spatial annotations, reducing annotation effort while maintaining accuracy.
Findings
Achieves comparable results to fully supervised methods on benchmark datasets.
Introduces DALY, a large-scale dataset for realistic action localization.
Effective learning of action detectors from minimal spatial supervision.
Abstract
We introduce an approach for spatio-temporal human action localization using sparse spatial supervision. Our method leverages the large amount of annotated humans available today and extracts human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Given these high-quality human tubes and temporal supervision, we select positive and negative tubes with very sparse spatial supervision, i.e., only one spatially annotated frame per instance. The selected tubes allow us to effectively learn a spatio-temporal action detector based on dense trajectories or CNNs. We conduct experiments on existing action localization benchmarks: UCF-Sports, J-HMDB and UCF-101. Our results show that our approach, despite using sparse spatial supervision, performs on par with methods using full supervision, i.e., one bounding box annotation per frame. To further validate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Video Surveillance and Tracking Methods
