A flexible model for training action localization with varying levels of   supervision

Guilhem Ch\'eron; Jean-Baptiste Alayrac; Ivan Laptev; Cordelia Schmid

arXiv:1806.11328·cs.CV·November 29, 2018·25 cites

A flexible model for training action localization with varying levels of supervision

Guilhem Ch\'eron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid

PDF

Open Access 1 Repo

TL;DR

This paper introduces a flexible discriminative clustering framework for action localization in videos that effectively combines various levels of supervision, reducing manual annotation effort while maintaining competitive accuracy.

Contribution

The proposed model unifies different weak supervision types in a single framework, enabling joint learning and improved performance with minimal fully supervised data.

Findings

01

Competitive results on UCF101-24 and DALY datasets.

02

Significant performance gains by adding few fully supervised examples.

03

Effective integration of diverse supervision levels in training.

Abstract

Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is a clear need to minimize the amount of manual supervision. In this work we propose a unifying framework that can handle and combine varying types of less-demanding weak supervision. Our model is based on discriminative clustering and integrates different types of supervision as constraints on the optimization. We investigate applications of such a model to training setups with alternative supervisory signals ranging from video-level class labels to the full per-frame annotation of action bounding boxes. Experiments on the challenging UCF101-24 and DALY datasets demonstrate competitive performance of our method at a fraction of supervision used by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jalayrac/weakactionloc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Robot Manipulation and Learning