VideoWeave: A Data-Centric Approach for Efficient Video Understanding

Zane Durante; Silky Singh; Arpandeep Khatua; Shobhit Agarwal; Reuben Tan; Yong Jae Lee; Jianfeng Gao; Ehsan Adeli; Li Fei-Fei

arXiv:2601.06309·cs.CV·January 13, 2026

VideoWeave: A Data-Centric Approach for Efficient Video Understanding

Zane Durante, Silky Singh, Arpandeep Khatua, Shobhit Agarwal, Reuben Tan, Yong Jae Lee, Jianfeng Gao, Ehsan Adeli, Li Fei-Fei

PDF

Open Access

TL;DR

VideoWeave introduces a data-centric method that enhances video-language model training by creating synthetic long-context samples through splicing short videos, improving data efficiency without changing model architectures.

Contribution

It proposes a novel data reorganization technique for training video-language models, emphasizing data composition over architectural modifications.

Findings

01

Higher accuracy with VideoWeave under same compute

02

Data reorganization improves downstream performance

03

Effective without changing model architectures

Abstract

Training video-language models is often prohibitively expensive due to the high cost of processing long frame sequences and the limited availability of annotated long videos. We present VideoWeave, a simple yet effective approach to improve data efficiency by constructing synthetic long-context training samples that splice together short, captioned videos from existing datasets. Rather than modifying model architectures or optimization objectives, VideoWeave reorganizes available video-text pairs to expand temporal diversity within fixed compute. We systematically study how different data composition strategies like random versus visually clustered splicing and caption enrichment affect downstream performance on downstream video question answering. Under identical compute constraints, models trained with VideoWeave achieve higher accuracy than conventional video finetuning. Our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis