Co-Reinforcement Learning for Unified Multimodal Understanding and Generation

Jingjing Jiang; Chongjie Si; Jun Luo; Hanwang Zhang; Chao Ma

arXiv:2505.17534·cs.CV·November 21, 2025

Co-Reinforcement Learning for Unified Multimodal Understanding and Generation

Jingjing Jiang, Chongjie Si, Jun Luo, Hanwang Zhang, Chao Ma

PDF

1 Repo

TL;DR

This paper introduces CoRL, a co-reinforcement learning framework for unified multimodal large language models, enhancing both understanding and generation capabilities through joint and task-specific optimization, leading to significant performance improvements.

Contribution

The paper proposes a novel CoRL framework that enables simultaneous reinforcement learning for multimodal understanding and generation in large language models, demonstrating its effectiveness.

Findings

01

ULM-R1 improves 7% on text-to-image datasets

02

ULM-R1 achieves 23% better on multimodal understanding benchmarks

03

Reinforcement learning enhances cross-task synergy in ULMs

Abstract

This paper presents a pioneering exploration of reinforcement learning (RL) via group relative policy optimization for unified multimodal large language models (ULMs), aimed at simultaneously reinforcing generation and understanding capabilities. Through systematic pilot studies, we uncover the significant potential of ULMs to enable the synergistic co-evolution of dual capabilities within a shared policy optimization framework. Building on this insight, we introduce CoRL, a co-reinforcement learning framework comprising a unified RL stage for joint optimization and a refined RL stage for task-specific enhancement. With the proposed CoRL, our resulting model, ULM-R1, achieves average improvements of 7% on three text-to-image generation datasets and 23% on nine multimodal understanding benchmarks. These results demonstrate the effectiveness of CoRL and highlight the substantial benefit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mm-vl/ulm-r1
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.