Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models
Bowen Ping, Chengyou Jia, Minnan Luo, Hangwei Qian, Ivor Tsang

TL;DR
Flow-Factory is a modular, unified framework that simplifies reinforcement learning in flow-matching models, enabling easy integration, rapid prototyping, and scalable deployment for diverse architectures and algorithms.
Contribution
It introduces a flexible, registry-based architecture that decouples algorithms, models, and rewards, facilitating seamless integration and rapid development in reinforcement learning for flow-matching models.
Findings
Supports multiple algorithms and architectures like GRPO, DiffusionNFT, AWM
Provides production-ready memory optimization and distributed training
Enables flexible multi-reward training and rapid prototyping
Abstract
Reinforcement learning has emerged as a promising paradigm for aligning diffusion and flow-matching models with human preferences, yet practitioners face fragmented codebases, model-specific implementations, and engineering complexity. We introduce Flow-Factory, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture. This design enables seamless integration of new algorithms and architectures, as demonstrated by our support for GRPO, DiffusionNFT, and AWM across Flux, Qwen-Image, and WAN video models. By minimizing implementation overhead, Flow-Factory empowers researchers to rapidly prototype and scale future innovations with ease. Flow-Factory provides production-ready memory optimization, flexible multi-reward training, and seamless distributed training support. The codebase is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Human Pose and Action Recognition
