Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models

David McAllister; Miika Aittala; Tero Karras; Janne Hellsten; Angjoo Kanazawa; Timo Aila; Samuli Laine

arXiv:2603.12893·cs.CV·March 16, 2026

Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models

David McAllister, Miika Aittala, Tero Karras, Janne Hellsten, Angjoo Kanazawa, Timo Aila, Samuli Laine

PDF

Open Access 1 Models

TL;DR

This paper introduces an online reinforcement learning method for post-training optimization of text-to-image diffusion models, reducing update variance and improving image quality and prompt alignment.

Contribution

It proposes a novel RL approach that considers the entire sampling process as a single action, leading to faster convergence and better results.

Findings

01

Faster convergence compared to previous methods

02

Higher image quality and prompt alignment

03

Effective with various reward metrics

Abstract

Reinforcement learning (RL) has become a standard technique for post-training diffusion-based image synthesis models, as it enables learning from reward signals to explicitly improve desirable aspects such as image quality and prompt alignment. In this paper, we propose an online RL variant that reduces the variance in the model updates by sampling paired trajectories and pulling the flow velocity in the direction of the more favorable image. Unlike existing methods that treat each sampling step as a separate policy action, we consider the entire sampling process as a single action. We experiment with both high-quality vision language models and off-the-shelf quality metrics for rewards, and evaluate the outputs using a broad set of metrics. Our method converges faster and yields higher output quality and prompt alignment than previous approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
nvidia/finite-difference-flow-optimization
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications