Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning
Zhuyang Xie, Yan Yang, Yankai Yu, Jie Wang, Yongquan Jiang, Xiao Wu

TL;DR
This paper introduces MCCL, a dense video captioning model that uses cyclic co-learning and concept detection to improve event localization and description in untrimmed videos, achieving state-of-the-art results.
Contribution
The paper proposes a novel cyclic co-learning framework that integrates weakly supervised concept detection with captioning for enhanced dense video captioning.
Findings
Achieves state-of-the-art performance on ActivityNet Captions.
Demonstrates effectiveness of cyclic co-learning in video captioning.
Improves semantic perception and event localization accuracy.
Abstract
Dense video captioning aims to detect and describe all events in untrimmed videos. This paper presents a dense video captioning network called Multi-Concept Cyclic Learning (MCCL), which aims to: (1) detect multiple concepts at the frame level, using these concepts to enhance video features and provide temporal event cues; and (2) design cyclic co-learning between the generator and the localizer within the captioning network to promote semantic perception and event localization. Specifically, we perform weakly supervised concept detection for each frame, and the detected concept embeddings are integrated into the video features to provide event cues. Additionally, video-level concept contrastive learning is introduced to obtain more discriminative concept embeddings. In the captioning network, we establish a cyclic co-learning strategy where the generator guides the localizer for event…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Subtitles and Audiovisual Media · Multimodal Machine Learning Applications
MethodsContrastive Learning
