TL;DR
This paper introduces Sync-R1, a reinforcement learning framework that enhances personalized understanding and generation in multimodal models through explicit reasoning and dual-task synergy.
Contribution
It presents a novel end-to-end reinforcement learning approach with dynamic group scaling and a new benchmark, improving personalized reasoning and generation in unified multimodal models.
Findings
Sync-R1 achieves state-of-the-art performance in personalized reasoning and generation.
The proposed methods improve convergence speed and reduce gradient variance.
Experimental results demonstrate robust personalization without cold-start issues.
Abstract
Unified Multimodal Models (UMMs) excel in general tasks but struggle to bridge the gap between personalized understanding and generation. Prior works largely rely on implicit token-level alignment via supervised fine-tuning, which fails to fully capture the potential synergy between comprehension and creation. In this work, we propose Sync-R1, an end-to-end reinforcement learning framework that jointly optimizes personalized understanding and generation within a single, explicit reasoning loop. Through this unified feedback process, Sync-R1 enables personalized comprehension to guide content creation, while the resulting generation quality reciprocally refines understanding within an integrated reward landscape. To efficiently orchestrate this dual-task synergy, we introduce Sync-GRPO, a reinforcement learning method utilizing an ensemble reward system. Furthermore, we propose Dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
