Eagle 2: Building Post-Training Data Strategies from Scratch for   Frontier Vision-Language Models

Zhiqi Li; Guo Chen; Shilong Liu; Shihao Wang; Vibashan VS; Yishen Ji,; Shiyi Lan; Hao Zhang; Yilin Zhao; Subhashree Radhakrishnan; Nadine Chang,; Karan Sapra; Amala Sanjay Deshmukh; Tuomas Rintamaki; Matthieu Le; Ilia; Karmanov; Lukas Voegtle; Philipp Fischer; De-An Huang; Timo Roman; Tong Lu,; Jose M. Alvarez; Bryan Catanzaro; Jan Kautz; Andrew Tao; Guilin Liu; Zhiding; Yu

arXiv:2501.14818·cs.CV·January 28, 2025·2 cites

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models

Zhiqi Li, Guo Chen, Shilong Liu, Shihao Wang, Vibashan VS, Yishen Ji,, Shiyi Lan, Hao Zhang, Yilin Zhao, Subhashree Radhakrishnan, Nadine Chang,, Karan Sapra, Amala Sanjay Deshmukh, Tuomas Rintamaki, Matthieu Le, Ilia, Karmanov, Lukas Voegtle, Philipp Fischer, De-An Huang

PDF

Open Access 1 Repo 10 Models 1 Datasets

TL;DR

Eagle 2 demonstrates that carefully designed post-training data strategies can significantly enhance open-source vision-language models, achieving state-of-the-art results comparable to larger proprietary models.

Contribution

This work introduces a novel data-centric post-training strategy for VLMs, providing detailed insights and recipes to develop competitive open-source models from scratch.

Findings

01

Eagle2-9B achieves state-of-the-art results on multiple benchmarks.

02

The data strategy enables smaller models to match larger proprietary models.

03

Detailed development process benefits open-source VLM community.

Abstract

Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nvlabs/eagle
pytorchOfficial

Models

Datasets

nvidia/Llama-Nemotron-VLM-Dataset-v1
dataset· 1.5k dl
1.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling