Value Gradient Guidance for Flow Matching Alignment
Zhen Liu, Tim Z. Xiao, Carles Domingo-Enrich, Weiyang Liu, Dinghuai Zhang

TL;DR
This paper introduces VGG-Flow, a novel gradient guidance method based on optimal control theory, for efficient and prior-preserving fine-tuning of flow matching models in generative tasks, especially text-to-image alignment.
Contribution
The paper proposes VGG-Flow, a new gradient-matching approach that leverages value functions and optimal control theory for fast, effective fine-tuning of flow matching models.
Findings
VGG-Flow achieves effective alignment with limited computational resources.
The method preserves prior distributions while adapting to new preferences.
Empirical results on Stable Diffusion 3 demonstrate improved fine-tuning performance.
Abstract
While methods exist for aligning flow matching models--a popular and effective class of generative models--with human preferences, existing approaches fail to achieve both adaptation efficiency and probabilistically sound prior preservation. In this work, we leverage the theory of optimal control and propose VGG-Flow, a gradient-matching-based method for finetuning pretrained flow matching models. The key idea behind this algorithm is that the optimal difference between the finetuned velocity field and the pretrained one should be matched with the gradient field of a value function. This method not only incorporates first-order information from the reward model but also benefits from heuristic initialization of the value function to enable fast adaptation. Empirically, we show on a popular text-to-image flow matching model, Stable Diffusion 3, that our method can finetune flow matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation
