UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts

Fu-Yun Wang; Han Zhang; Michael Gharbi; Hongsheng Li; Taesung Park

arXiv:2510.17937·cs.LG·October 22, 2025

UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts

Fu-Yun Wang, Han Zhang, Michael Gharbi, Hongsheng Li, Taesung Park

PDF

Open Access 1 Models

TL;DR

UniRL-Zero introduces a unified RL framework that enhances multimodal understanding and generation by integrating language and diffusion models, establishing systematic baselines for reinforcement learning in unified models.

Contribution

It proposes a novel unified reinforcement learning framework combining language and diffusion models, with defined scenarios and systematic baselines for multimodal tasks.

Findings

01

Enhanced multimodal understanding and reasoning.

02

Improved multimedia generation capabilities.

03

Established systematic RL baselines for unified models.

Abstract

We present UniRL-Zero, a unified reinforcement learning (RL) framework that boosts, multimodal language model understanding and reasoning, diffusion model multimedia generation, and their beneficial interaction capabilities within a unified model. Our work defines six scenarios for unified model reinforcement learning, providing systematic baselines for reinforcement learning of unified understanding and generation model. Our code is available at https://github.com/G-U-N/UniRL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
wangfuyun/PrompRL
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Topic Modeling