TL;DR
This paper introduces CoRL, a co-reinforcement learning framework for unified multimodal large language models, enhancing both understanding and generation capabilities through joint and task-specific optimization, leading to significant performance improvements.
Contribution
The paper proposes a novel CoRL framework that enables simultaneous reinforcement learning for multimodal understanding and generation in large language models, demonstrating its effectiveness.
Findings
ULM-R1 improves 7% on text-to-image datasets
ULM-R1 achieves 23% better on multimodal understanding benchmarks
Reinforcement learning enhances cross-task synergy in ULMs
Abstract
This paper presents a pioneering exploration of reinforcement learning (RL) via group relative policy optimization for unified multimodal large language models (ULMs), aimed at simultaneously reinforcing generation and understanding capabilities. Through systematic pilot studies, we uncover the significant potential of ULMs to enable the synergistic co-evolution of dual capabilities within a shared policy optimization framework. Building on this insight, we introduce CoRL, a co-reinforcement learning framework comprising a unified RL stage for joint optimization and a refined RL stage for task-specific enhancement. With the proposed CoRL, our resulting model, ULM-R1, achieves average improvements of 7% on three text-to-image generation datasets and 23% on nine multimodal understanding benchmarks. These results demonstrate the effectiveness of CoRL and highlight the substantial benefit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
