Video DataFlywheel: Resolving the Impossible Data Trinity in   Video-Language Understanding

Xiao Wang; Jianlong Wu; Zijia Lin; Fuzheng Zhang; Di Zhang; and; Liqiang Nie

arXiv:2409.19532·cs.CV·October 1, 2024

Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding

Xiao Wang, Jianlong Wu, Zijia Lin, Fuzheng Zhang, Di Zhang, and, Liqiang Nie

PDF

Open Access

TL;DR

This paper introduces the Video DataFlywheel framework, an iterative method that refines video annotations and controls noise to improve large-scale video-language understanding, addressing the data scarcity and quality-diversity trade-off.

Contribution

It proposes a novel iterative refinement framework with AdaTaiLr noise control, enhancing dataset quality and scalability for video-language pre-training.

Findings

01

Achieves a 3% performance boost over baselines.

02

Improves dataset quality with minimal diversity loss.

03

Enhances video question answering and retrieval tasks.

Abstract

Recently, video-language understanding has achieved great success through large-scale pre-training. However, data scarcity remains a prevailing challenge. This study quantitatively reveals an "impossible trinity" among data quantity, diversity, and quality in pre-training datasets. Recent efforts seek to refine large-scale, diverse ASR datasets compromised by low quality through synthetic annotations. These methods successfully leverage useful information in multimodal video content (frames, tags, ASR transcripts, etc.) to refine the original annotations. Nevertheless, they struggle to mitigate noise within synthetic annotations and lack scalability as the dataset size expands. To address these issues, we introduce the Video DataFlywheel framework, which iteratively refines video annotations with improved noise control methods. For iterative refinement, we first leverage a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization