Graph-GRPO: Training Graph Flow Models with Reinforcement Learning
Baoheng Zhu, Deyu Bo, Delvin Ce Zhang, Xiao Wang

TL;DR
Graph-GRPO introduces a reinforcement learning framework for training graph flow models with verifiable rewards, improving graph generation quality and efficiency, especially in molecular optimization tasks.
Contribution
It derives an analytical transition probability for GFMs and proposes a localized exploration strategy, enabling effective RL training and self-improvement.
Findings
Achieves 95.0% validity on synthetic datasets.
Attains state-of-the-art results in molecular optimization.
Uses only 50 denoising steps for high-quality generation.
Abstract
Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Mobile Crowdsensing and Crowdsourcing
