FineVision: Open Data Is All You Need
Luis Wiedmann, Orr Zohar, Amir Mahla, Xiaohan Wang, Rui Li, Thibaud Frere, Leandro von Werra, Aritra Roy Gosthipaty, Andr\'es Marafioti

TL;DR
FineVision is a large, carefully curated open dataset of 24 million vision-language samples, designed to improve the training and evaluation of vision-language models through rigorous data collection and cleaning.
Contribution
The paper introduces FineVision, the largest unified, high-quality open dataset for vision-language models, with a semi-automated, human-in-the-loop curation process.
Findings
Models trained on FineVision outperform those trained on other open datasets.
FineVision's data hygiene and scale lead to better model performance.
The dataset and tools are publicly released to support future research.
Abstract
The advancement of vision-language models (VLMs) is hampered by a fragmented landscape of inconsistent and contaminated public datasets. We introduce FineVision, a meticulously collected, curated, and unified corpus of 24 million samples - the largest open resource of its kind. We unify more than 200 sources into 185 subsets via a semi-automated, human-in-the-loop pipeline: automation performs bulk ingestion and schema mapping, while reviewers audit mappings and spot-check outputs to verify faithful consumption of annotations, appropriate formatting and diversity, and safety; issues trigger targeted fixes and re-runs. The workflow further applies rigorous de-duplication within and across sources and decontamination against 66 public benchmarks. FineVision also encompasses agentic/GUI tasks with a unified action space; reviewers validate schemas and inspect a sample of trajectories to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- HuggingFaceM4/FineVisiondataset· 145k dl145k dl
- HuggingFaceM4/FineVisionMaxdataset· 9.5k dl9.5k dl
- Windwave/FineVisiondataset· 208 dl208 dl
- WenqingCao/finevisionmax-strict-ans-ablationdataset· 5.1k dl5.1k dl
- WenqingCao/fv-pipeline-testdataset· 51 dl51 dl
- WenqingCao/fv-pipeline-test-v2dataset· 51 dl51 dl
- WenqingCao/fv-annot-testdataset· 20 dl20 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Ethics and Social Impacts of AI
