Euphonium: Steering Video Flow Matching via Process Reward Gradient Guided Stochastic Dynamics

Ruizhe Zhong; Jiesong Lian; Xiaoyue Mi; Zixiang Zhou; Yuan Zhou; Qinglin Lu; Junchi Yan

arXiv:2602.04928·cs.LG·February 9, 2026

Euphonium: Steering Video Flow Matching via Process Reward Gradient Guided Stochastic Dynamics

Ruizhe Zhong, Jiesong Lian, Xiaoyue Mi, Zixiang Zhou, Yuan Zhou, Qinglin Lu, Junchi Yan

PDF

Open Access

TL;DR

Euphonium introduces a reward-gradient guided stochastic dynamics framework for more efficient and high-quality flow matching in video generation, significantly improving training speed and alignment with human preferences.

Contribution

The paper presents a theoretically grounded method that guides sampling with process reward gradients, internalizes guidance into the flow model, and demonstrates superior performance in text-to-video tasks.

Findings

01

Achieves better alignment in text-to-video generation.

02

Speeds up training convergence by 1.66 times.

03

Unifies existing sampling methods under a new theoretical framework.

Abstract

While online Reinforcement Learning has emerged as a crucial technique for aligning flow matching models with human preferences, current approaches are hindered by inefficient exploration during training rollouts. Relying on undirected stochasticity and sparse outcome rewards, these methods struggle to discover high-reward samples, resulting in data-inefficient and slow optimization. To address these limitations, we propose Euphonium, a novel framework that steers generation via process reward gradient guided dynamics. Our key insight is to formulate the sampling process as a theoretically principled Stochastic Differential Equation that explicitly incorporates the gradient of a Process Reward Model into the flow drift. This design enables dense, step-by-step steering toward high-reward regions, advancing beyond the unguided exploration in prior works, and theoretically encompasses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Human Motion and Animation