Pseudo-labeling with Keyword Refining for Few-Supervised Video   Captioning

Ping Li; Tao Wang; Xinkui Zhao; Xianghua Xu; Mingli Song

arXiv:2411.04059·cs.CV·November 7, 2024

Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning

Ping Li, Tao Wang, Xinkui Zhao, Xianghua Xu, Mingli Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework for video captioning that effectively utilizes minimal supervision by pseudo-labeling and keyword refinement, significantly reducing the need for extensive annotated data.

Contribution

It proposes a new few-supervised video captioning approach combining pseudo-labeling with keyword refinement, enhancing caption quality with limited ground-truth sentences.

Findings

01

Outperforms existing methods in few-supervised settings

02

Achieves competitive results with minimal annotated data

03

Demonstrates effectiveness on multiple benchmark datasets

Abstract

Video captioning generate a sentence that describes the video content. Existing methods always require a number of captions (\eg, 10 or 20) per video to train the model, which is quite costly. In this work, we explore the possibility of using only one or very few ground-truth sentences, and introduce a new task named few-supervised video captioning. Specifically, we propose a few-supervised video captioning framework that consists of lexically constrained pseudo-labeling module and keyword-refined captioning module. Unlike the random sampling in natural language processing that may cause invalid modifications (\ie, edit words), the former module guides the model to edit words using some actions (\eg, copy, replace, insert, and delete) by a pretrained token-level classifier, and then fine-tunes candidate sentences by a pretrained language model. Meanwhile, the former employs the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlvccn/pkg_vidcap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques