Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning
Ping Li, Tao Wang, Xinkui Zhao, Xianghua Xu, Mingli Song

TL;DR
This paper introduces a novel framework for video captioning that effectively utilizes minimal supervision by pseudo-labeling and keyword refinement, significantly reducing the need for extensive annotated data.
Contribution
It proposes a new few-supervised video captioning approach combining pseudo-labeling with keyword refinement, enhancing caption quality with limited ground-truth sentences.
Findings
Outperforms existing methods in few-supervised settings
Achieves competitive results with minimal annotated data
Demonstrates effectiveness on multiple benchmark datasets
Abstract
Video captioning generate a sentence that describes the video content. Existing methods always require a number of captions (\eg, 10 or 20) per video to train the model, which is quite costly. In this work, we explore the possibility of using only one or very few ground-truth sentences, and introduce a new task named few-supervised video captioning. Specifically, we propose a few-supervised video captioning framework that consists of lexically constrained pseudo-labeling module and keyword-refined captioning module. Unlike the random sampling in natural language processing that may cause invalid modifications (\ie, edit words), the former module guides the model to edit words using some actions (\eg, copy, replace, insert, and delete) by a pretrained token-level classifier, and then fine-tunes candidate sentences by a pretrained language model. Meanwhile, the former employs the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
