MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation

Xiaoxiao Ma; Jiachen Lei; Tianfei Ren; Jie Huang; Siming Fu; Aiming Hao; Jiahong Wu; Xiangxiang Chu; Feng Zhao

arXiv:2604.06966·cs.CV·April 9, 2026

MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation

Xiaoxiao Ma, Jiachen Lei, Tianfei Ren, Jie Huang, Siming Fu, Aiming Hao, Jiahong Wu, Xiangxiang Chu, Feng Zhao

PDF

1 Repo

TL;DR

This paper introduces MAR-GRPO, a stabilized reinforcement learning framework for hybrid autoregressive-diffusion image generation, reducing gradient noise and improving stability and quality.

Contribution

It proposes multi-trajectory expectation and token uncertainty estimation to stabilize training and enhance image generation quality in MAR models.

Findings

01

Improved visual quality over baseline models

02

Enhanced training stability and convergence

03

Better spatial structure understanding in generated images

Abstract

Reinforcement learning (RL) has been successfully applied to autoregressive (AR) and diffusion models. However, extending RL to hybrid AR-diffusion frameworks remains challenging due to interleaved inference and noisy log-probability estimation. In this work, we study masked autoregressive models (MAR) and show that the diffusion head plays a critical role in training dynamics, often introducing noisy gradients that lead to instability and early performance saturation. To address this issue, we propose a stabilized RL framework for MAR. We introduce multi-trajectory expectation (MTE), which estimates the optimization direction by averaging over multiple diffusion trajectories, thereby reducing diffusion-induced gradient noise. To avoid over-smoothing, we further estimate token-wise uncertainty from multiple trajectories and apply multi-trajectory optimization only to the top-k%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AMAP-ML/mar-grpo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.