Context Consistency Learning via Sentence Removal for Semi-Supervised Video Paragraph Grounding
Yaokun Zhong, Siyu Jiang, Jian Zhu, Jian-Fang Hu

TL;DR
This paper introduces a novel semi-supervised learning framework for video paragraph grounding that leverages context perturbation and consistency regularization to improve localization accuracy with limited annotations.
Contribution
It proposes Context Consistency Learning (CCL), combining perturbation-based regularization and pseudo-labeling to enhance semi-supervised video grounding performance.
Findings
CCL significantly outperforms existing methods in experiments.
Perturbing query contexts improves supervisory signals.
The framework effectively utilizes limited annotations for accurate localization.
Abstract
Semi-Supervised Video Paragraph Grounding (SSVPG) aims to localize multiple sentences in a paragraph from an untrimmed video with limited temporal annotations. Existing methods focus on teacher-student consistency learning and video-level contrastive loss, but they overlook the importance of perturbing query contexts to generate strong supervisory signals. In this work, we propose a novel Context Consistency Learning (CCL) framework that unifies the paradigms of consistency regularization and pseudo-labeling to enhance semi-supervised learning. Specifically, we first conduct teacher-student learning where the student model takes as inputs strongly-augmented samples with sentences removed and is enforced to learn from the adequately strong supervisory signals from the teacher model. Afterward, we conduct model retraining based on the generated pseudo labels, where the mutual agreement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
