Vero: An Open RL Recipe for General Visual Reasoning

Gabriel Sarch; Linrong Cai; Qunzhong Wang; Haoyang Wu; Danqi Chen; Zhuang Liu

arXiv:2604.04917·cs.CV·April 8, 2026

Vero: An Open RL Recipe for General Visual Reasoning

Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu

PDF

4 Models 1 Datasets

TL;DR

Vero is an open, scalable vision-language model trained with reinforcement learning on a diverse dataset, achieving state-of-the-art visual reasoning across multiple challenging benchmarks.

Contribution

The paper introduces Vero, an open-source RL-trained vision-language model with a large, diverse dataset and task-specific rewards, surpassing existing models in visual reasoning tasks.

Findings

01

Vero achieves 3.6-5.3 points improvement over base models on 30 benchmarks.

02

Vero outperforms proprietary models like Qwen3-VL-8B-Thinking on most benchmarks.

03

Diverse task categories are key to effective RL scaling and reasoning transfer.

Abstract

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pipelines with non-public data. We introduce Vero, a family of fully open VLMs that matches or exceeds existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answer formats. Vero achieves state-of-the-art performance, improving over four base models by 3.6-5.3 points on average across VeroEval, our suite of 30 challenging benchmarks. Starting from Qwen3-VL-8B-Instruct, Vero…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

zlab-princeton/Vero-600k
dataset· 60k dl
60k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.