Learning from Video and Text via Large-Scale Discriminative Clustering

Antoine Miech; Jean-Baptiste Alayrac; Piotr Bojanowski; Ivan Laptev,; Josef Sivic

arXiv:1707.09074·cs.CV·July 31, 2017

Learning from Video and Text via Large-Scale Discriminative Clustering

Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev,, Josef Sivic

PDF

2 Repos 1 Video

TL;DR

This paper introduces an online optimization method for discriminative clustering, significantly improving scalability and enabling weakly supervised learning of actions and actors from large-scale video datasets with scripts.

Contribution

It proposes a scalable online algorithm based on Block-Coordinate Frank-Wolfe for discriminative clustering, applied to weakly supervised video understanding tasks.

Findings

01

Enhanced action recognition accuracy on large-scale movie datasets

02

Successful application to weakly supervised learning from videos and scripts

03

Scalable approach enables processing of 66 movies for improved results

Abstract

Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks. Such applications include person and action recognition, text-to-video alignment, object co-segmentation and colocalization in videos and images. One drawback of discriminative clustering, however, is its limited scalability. We address this issue and propose an online optimization algorithm based on the Block-Coordinate Frank-Wolfe algorithm. We apply the proposed method to the problem of weakly supervised learning of actions and actors from movies together with corresponding movie scripts. The scaling up of the learning problem to 66 feature length movies enables us to significantly improve weakly supervised action recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Learning from Video and Text via Large-Scale Discriminative Clustering· youtube