# OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

**Authors:** Yuan Gong, Xionghui Wang, Jie Wu, Shiyin Wang, Yitong Wang, Xinglong Wu

arXiv: 2508.21066 · 2025-08-29

## TL;DR

OneReward introduces a unified reinforcement learning framework using a single vision-language model to improve multi-task mask-guided image generation, outperforming existing methods without task-specific fine-tuning.

## Contribution

The paper presents a novel multi-task reinforcement learning approach with a unified reward model for diverse image editing tasks, eliminating the need for task-specific training.

## Key findings

- Outperforms commercial and open-source competitors in multiple metrics
- Enables multi-task image editing without task-specific fine-tuning
- Demonstrates effective generalization across diverse image generation tasks

## Abstract

In this paper, we introduce OneReward, a unified reinforcement learning framework that enhances the model's generative capabilities across multiple tasks under different evaluation criteria using only \textit{One Reward} model. By employing a single vision-language model (VLM) as the generative reward model, which can distinguish the winner and loser for a given task and a given evaluation criterion, it can be effectively applied to multi-task generation models, particularly in contexts with varied data and diverse task objectives. We utilize OneReward for mask-guided image generation, which can be further divided into several sub-tasks such as image fill, image extend, object removal, and text rendering, involving a binary mask as the edit area. Although these domain-specific tasks share same conditioning paradigm, they differ significantly in underlying data distributions and evaluation metrics. Existing methods often rely on task-specific supervised fine-tuning (SFT), which limits generalization and training efficiency. Building on OneReward, we develop Seedream 3.0 Fill, a mask-guided generation model trained via multi-task reinforcement learning directly on a pre-trained base model, eliminating the need for task-specific SFT. Experimental results demonstrate that our unified edit model consistently outperforms both commercial and open-source competitors, such as Ideogram, Adobe Photoshop, and FLUX Fill [Pro], across multiple evaluation dimensions. Code and model are available at: https://one-reward.github.io

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21066/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21066/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/2508.21066/full.md

---
Source: https://tomesphere.com/paper/2508.21066