Video Understanding as Machine Translation
Bruno Korbar, Fabio Petroni, Rohit Girdhar, Lorenzo Torresani

TL;DR
This paper introduces a generative, translation-based approach to video understanding that eliminates the need for negative sampling, enabling a unified framework for various tasks with improved performance on multiple benchmarks.
Contribution
It proposes a novel generative modeling framework for video understanding that removes the reliance on negative sampling and contrastive learning, unifying multiple tasks.
Findings
Achieved state-of-the-art results on several downstream tasks.
Demonstrated effectiveness on large-scale HowTo100M dataset.
Improved performance in video classification, question answering, captioning, and retrieval.
Abstract
With the advent of large-scale multimodal video datasets, especially sequences with audio or transcribed speech, there has been a growing interest in self-supervised learning of video representations. Most prior work formulates the objective as a contrastive metric learning problem between the modalities. To enable effective learning, however, these strategies require a careful selection of positive and negative samples often combined with hand-designed curriculum policies. In this work we remove the need for negative sampling by taking a generative modeling approach that poses the objective as a translation problem between modalities. Such a formulation allows us to tackle a wide variety of downstream video understanding tasks by means of a single unified framework, without the need for large batches of negative samples common in contrastive metric learning. We experiment with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Cancer-related molecular mechanisms research
