Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models

Dailan He; Guanlin Feng; Xingtong Ge; Yazhe Niu; Yi Zhang; Bingqi Ma; Guanglu Song; Yu Liu; Hongsheng Li

arXiv:2511.16955·cs.CV·March 19, 2026

Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models

Dailan He, Guanlin Feng, Xingtong Ge, Yazhe Niu, Yi Zhang, Bingqi Ma, Guanglu Song, Yu Liu, Hongsheng Li

PDF

Open Access

TL;DR

Neighbor GRPO introduces a novel deterministic ODE-based contrastive policy optimization method that improves efficiency, convergence, and quality in generative modeling without relying on stochastic differential equations.

Contribution

It proposes Neighbor GRPO, a new SDE-free alignment algorithm that enhances flow model training by leveraging contrastive learning and theoretical policy gradient connections.

Findings

01

Outperforms SDE-based methods in training speed and quality

02

Maintains efficiency and compatibility with high-order solvers

03

Demonstrates superior convergence and generation results

Abstract

Group Relative Policy Optimization (GRPO) has shown promise in aligning image and video generative models with human preferences. However, applying it to modern flow matching models is challenging because of its deterministic sampling paradigm. Current methods address this issue by converting Ordinary Differential Equations (ODEs) to Stochastic Differential Equations (SDEs), which introduce stochasticity. However, this SDE-based GRPO suffers from issues of inefficient credit assignment and incompatibility with high-order solvers for fewer-step sampling. In this paper, we first reinterpret existing SDE-based GRPO methods from a distance optimization perspective, revealing their underlying mechanism as a form of contrastive learning. Based on this insight, we propose Neighbor GRPO, a novel alignment algorithm that completely bypasses the need for SDEs. Neighbor GRPO generates a diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Recommender Systems and Techniques · Stochastic Gradient Optimization Techniques