# Flow-Multi: A Flow-Matching Multi-Reward Framework for Text-to-Image Generation

**Authors:** Jaegun Lee, Janghoon Choi

PMC · DOI: 10.3390/s26041120 · 2026-02-09

## TL;DR

This paper introduces Flow-Multi, a new framework for text-to-image generation that uses multiple reward functions to improve image quality and alignment with human preferences.

## Contribution

The novel contribution is a multi-reward reinforcement learning framework using flow-matching and Pareto dominance to avoid overfitting and reward hacking.

## Key findings

- Flow-Multi achieves balanced improvements across multiple reward criteria compared to Flow-GRPO.
- The use of Pareto dominance and advantage masking improves policy optimization by focusing on high-quality rewards.
- The framework demonstrates stable alignment in text-to-image generation without overfitting to specific metrics.

## Abstract

Recent approaches in text-to-image (T2I) generation have actively adopted reinforcement learning (RL) techniques for human preference alignment. However, existing approaches primarily rely on a single reward function, which can lead to overfitting on specific metrics, resulting in issues such as reward hacking and imbalanced optimization among multiple objectives. To address this, we propose Flow-Multi: a flow-matching multi-reward framework for text-to-image generation. Our method builds upon flow-matching-based group-relative policy optimization (GRPO) learning. Each sample is evaluated by four reward models—based on text-to-image alignment, human preference, aesthetic quality, and GenEval—to create a multi-dimensional reward vector. We then utilize the Pareto dominance relationship to remove dominated samples and update the policy using only the non-dominated set. Additionally, we introduce advantage masking during training to suppress the contribution of low-reward samples, ensuring that only high-quality rewards are reflected in policy optimization. Experimental results demonstrate that Flow-Multi achieves balanced improvements across multiple reward criteria compared to the existing Flow-GRPO, validating the effectiveness of the multi-reward reinforcement learning framework for stable alignment in text-to-image generation.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** DPO (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12943997/full.md

---
Source: https://tomesphere.com/paper/PMC12943997