Uni-Synergy: Bridging Understanding and Generation for Personalized Reasoning via Co-operative Reinforcement Learning

Zijun Shen; Sihan Yang; Ruichuan An; Ziyu Guo; Hao Liang; Ming Lu; Renrui Zhang; Wentao Zhang

arXiv:2605.10445·cs.CV·May 12, 2026

Uni-Synergy: Bridging Understanding and Generation for Personalized Reasoning via Co-operative Reinforcement Learning

Zijun Shen, Sihan Yang, Ruichuan An, Ziyu Guo, Hao Liang, Ming Lu, Renrui Zhang, Wentao Zhang

PDF

1 Repo

TL;DR

This paper introduces Sync-R1, a reinforcement learning framework that enhances personalized understanding and generation in multimodal models through explicit reasoning and dual-task synergy.

Contribution

It presents a novel end-to-end reinforcement learning approach with dynamic group scaling and a new benchmark, improving personalized reasoning and generation in unified multimodal models.

Findings

01

Sync-R1 achieves state-of-the-art performance in personalized reasoning and generation.

02

The proposed methods improve convergence speed and reduce gradient variance.

03

Experimental results demonstrate robust personalization without cold-start issues.

Abstract

Unified Multimodal Models (UMMs) excel in general tasks but struggle to bridge the gap between personalized understanding and generation. Prior works largely rely on implicit token-level alignment via supervised fine-tuning, which fails to fully capture the potential synergy between comprehension and creation. In this work, we propose Sync-R1, an end-to-end reinforcement learning framework that jointly optimizes personalized understanding and generation within a single, explicit reasoning loop. Through this unified feedback process, Sync-R1 enables personalized comprehension to guide content creation, while the resulting generation quality reciprocally refines understanding within an integrated reward landscape. To efficiently orchestrate this dual-task synergy, we introduce Sync-GRPO, a reinforcement learning method utilizing an ensemble reward system. Furthermore, we propose Dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arctanxarc/UniCTokens
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.