One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma, Linge Du, Xuyang Shen, Shaoxiang Chen, Pengfei Li, Qibing Ren, Lizhuang Ma, Yuchao Dai, Pengfei Liu, Junjie Yan

TL;DR
This paper introduces V-Triune, a unified multimodal reinforcement learning framework, and develops Orsta models that outperform specialized models across multiple vision-language tasks.
Contribution
It proposes a novel V-Triune methodology for unified multimodal RL and demonstrates its effectiveness with the Orsta models on diverse benchmarks.
Findings
Unified training matches or outperforms specialist models.
Orsta models improve over backbones on MEGA-Bench.
Unified RL enhances reasoning and perception in VLMs.
Abstract
Reinforcement learning (RL) is becoming an important direction for post-training vision-language models (VLMs), but public training methodologies for unified multimodal RL remain much less mature, especially for heterogeneous reasoning and perception-heavy tasks. We propose V-Triune, a Visual Triple Unified Reinforcement Learning methodology for unified multimodal RL. It organizes training around three coordinated abstractions: Sample-Level Reward Routing, Verifier-Level Outcome Verification, and Source-Level Diagnostics. Within this methodology, Dynamic IoU provides localization-specific reward shaping that avoids reward ambiguity under loose thresholds and reward sparsity under strict ones. Built on V-Triune, we develop Orsta (7B, 32B), a family of models jointly trained on eight reasoning and perception tasks. Under matched budgets, unified training matches or outperforms specialist…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗One-RL-to-See-Them-All/Orsta-7Bmodel· 22 dl· ♡ 1222 dl♡ 12
- 🤗One-RL-to-See-Them-All/Orsta-32B-0321model· 14 dl· ♡ 214 dl♡ 2
- 🤗One-RL-to-See-Them-All/Orsta-32B-0326model· 14 dl· ♡ 814 dl♡ 8
- 🤗invincible-jha/Orsta-32B-0321model· 2 dl2 dl
- 🤗invincible-jha/Orsta-7Bmodel· 5 dl5 dl
- 🤗invincible-jha/Orsta-32B-0326model· 4 dl4 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
