Contrastive Video-Language Learning with Fine-grained Frame Sampling
Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia

TL;DR
This paper introduces FineCo, a fine-grained contrastive learning method that improves video-language representation by selecting semantically relevant frames, leading to state-of-the-art results on long video retrieval benchmarks.
Contribution
The paper proposes a novel fine-grained contrastive loss for frame sampling, enhancing cross-modal alignment by focusing on semantically relevant frames within videos.
Findings
Achieves state-of-the-art on YouCookII benchmark.
Provides competitive results on MSR-VTT retrieval tasks.
Improves video question answering performance.
Abstract
Despite recent progress in video and language representation learning, the weak or sparse correspondence between the two modalities remains a bottleneck in the area. Most video-language models are trained via pair-level loss to predict whether a pair of video and text is aligned. However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos. We propose FineCo (Fine-grained Contrastive Loss for Frame Sampling), an approach to better learn video and language representations with a fine-grained contrastive objective operating on video frames. It helps distil a video by selecting the frames that are semantically equivalent to the text, improving cross-modal correspondence. Building on the well established VideoCLIP model as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research
