VDCook:DIY video data cook your MLLMs

Chengwei Wu

arXiv:2603.05539·cs.LG·May 11, 2026

VDCook:DIY video data cook your MLLMs

Chengwei Wu

PDF

1 Repo

TL;DR

VDCook is a configurable, self-evolving video data platform that automates data construction, annotation, and continuous updating for machine learning applications, lowering barriers for specialized dataset creation.

Contribution

It introduces a novel, automated system for dynamic video dataset creation and expansion, integrating query-based data retrieval, synthesis, and multi-dimensional metadata annotation.

Findings

01

Enables continuous dataset updates through automated ingestion.

02

Supports domain-specific video data generation with provenance and metadata.

03

Facilitates community-driven dataset expansion and governance.

Abstract

We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain teams. Users initiate data requests via natural language queries and adjustable parameters (scale, retrieval-synthesis ratio, quality threshold). The system automatically performs query optimization, concurrently running real video retrieval and controlled synthesis modules. It ultimately generates in-domain data packages with complete provenance and metadata, along with reproducible Notebooks. Unlike traditional static, one-time-built datasets, VDCook enables continuous updates and domain expansion through its automated data ingestion mechanism based on MCP (Model Context Protocol)\cite{mcp2024anthropic}, transforming datasets into dynamically evolving open ecosystems. The system also provides multi-dimensional metadata annotation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://screenapp.io/app/v/WP0SvffgsH
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.