Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model

Renping Zhou; Zanlin Ni; Tianyi Chen; Zeyu Liu; Yang Yue; Yulin Wang; Yuxuan Wang; Jingshu Liu; Gao Huang

arXiv:2512.22288·cs.LG·December 30, 2025

Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model

Renping Zhou, Zanlin Ni, Tianyi Chen, Zeyu Liu, Yang Yue, Yulin Wang, Yuxuan Wang, Jingshu Liu, Gao Huang

PDF

Open Access

TL;DR

Co-GRPO introduces a unified approach to optimize both model parameters and inference schedules in Masked Diffusion Models, significantly enhancing generation quality by aligning training with inference procedures through trajectory-level policy optimization.

Contribution

The paper proposes Co-GRPO, a method that reformulates Masked Diffusion Model generation as a Markov Decision Process and applies group relative policy optimization to jointly optimize model and schedule parameters.

Findings

01

Improved generation quality across four benchmarks.

02

Effective joint optimization of model and schedule parameters.

03

Alignment of training and inference procedures enhances performance.

Abstract

Recently, Masked Diffusion Models (MDMs) have shown promising potential across vision, language, and cross-modal generation. However, a notable discrepancy exists between their training and inference procedures. In particular, MDM inference is a multi-step, iterative process governed not only by the model itself but also by various schedules that dictate the token-decoding trajectory (e.g., how many tokens to decode at each step). In contrast, MDMs are typically trained using a simplified, single-step BERT-style objective that masks a subset of tokens and predicts all of them simultaneously. This step-level simplification fundamentally disconnects the training paradigm from the trajectory-level nature of inference, leaving the inference schedules never optimized during training. In this paper, we introduce Co-GRPO, which reformulates MDM generation as a unified Markov Decision Process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning