Active Learning for Video Description With Cluster-Regularized Ensemble   Ranking

David M. Chan; Sudheendra Vijayanarasimhan; David A. Ross; John Canny

arXiv:2007.13913·cs.CV·December 4, 2020

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

PDF

TL;DR

This paper investigates active learning strategies for video captioning, demonstrating that a cluster-regularized ensemble approach can significantly reduce manual annotation needs while maintaining high captioning performance.

Contribution

It introduces a novel cluster-regularized ensemble active learning method specifically designed for video captioning tasks, improving data efficiency.

Findings

01

Achieves comparable performance with 60% less training data.

02

Outperforms existing active learning baselines on MSR-VTT and LSMDC datasets.

03

Effective with both transformer and LSTM captioning models.

Abstract

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive. Active learning is a promising way to efficiently build a training set for video captioning tasks while reducing the need to manually label uninformative examples. In this work we both explore various active learning approaches for automatic video captioning and show that a cluster-regularized ensemble strategy provides the best active learning approach to efficiently gather training sets for video captioning. We evaluate our approaches on the MSR-VTT and LSMDC datasets using both transformer and LSTM based captioning models and show that our novel strategy can achieve high performance while using up to 60% fewer training data than the strong state of the art baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory