AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Runhui Huang; Jie Wu; Rui Yang; Zhe Liu; Hengshuang Zhao

arXiv:2605.12495·cs.CV·May 13, 2026

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao

PDF

2 Repos

TL;DR

AlphaGRPO is a new framework that enhances multimodal generation in UMMs by applying a self-reflective, reward-based approach using decomposed, verifiable feedback, leading to improved reasoning and editing capabilities.

Contribution

It introduces AlphaGRPO, combining Group Relative Policy Optimization with a novel Decompositional Verifiable Reward for stable, interpretable supervision in multimodal generation.

Findings

01

Achieves robust improvements on multiple multimodal benchmarks.

02

Enhances reasoning and self-reflective capabilities in UMMs.

03

Improves editing tasks without specific training on editing.

Abstract

In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to perform advanced reasoning tasks: Reasoning Text-to-Image Generation, where the model actively infers implicit user intents, and Self-Reflective Refinement, where it autonomously diagnoses and corrects misalignments in generated outputs. To address the challenge of providing stable supervision for real-world multimodal generation, we introduce the Decompositional Verifiable Reward (DVReward). Unlike holistic scalar rewards, DVReward utilizes an LLM to decompose complex user requests into atomic, verifiable semantic and quality questions, which are then evaluated by a general MLLM to provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.