A flexible model for training action localization with varying levels of supervision
Guilhem Ch\'eron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid

TL;DR
This paper introduces a flexible discriminative clustering framework for action localization in videos that effectively combines various levels of supervision, reducing manual annotation effort while maintaining competitive accuracy.
Contribution
The proposed model unifies different weak supervision types in a single framework, enabling joint learning and improved performance with minimal fully supervised data.
Findings
Competitive results on UCF101-24 and DALY datasets.
Significant performance gains by adding few fully supervised examples.
Effective integration of diverse supervision levels in training.
Abstract
Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is a clear need to minimize the amount of manual supervision. In this work we propose a unifying framework that can handle and combine varying types of less-demanding weak supervision. Our model is based on discriminative clustering and integrates different types of supervision as constraints on the optimization. We investigate applications of such a model to training setups with alternative supervisory signals ranging from video-level class labels to the full per-frame annotation of action bounding boxes. Experiments on the challenging UCF101-24 and DALY datasets demonstrate competitive performance of our method at a fraction of supervision used by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Robot Manipulation and Learning
