PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Bowen Ping; Chengyou Jia; Minnan Luo; Changliang Xia; Xin Shen; Zhuohang Dang; Hangwei Qian

arXiv:2512.04784·cs.CV·March 17, 2026

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Bowen Ping, Chengyou Jia, Minnan Luo, Changliang Xia, Xin Shen, Zhuohang Dang, Hangwei Qian

PDF

Open Access 5 Models 2 Datasets

TL;DR

PaCo-RL introduces a reinforcement learning framework with a pairwise reward model and a novel optimization strategy to improve the consistency of generated images across multiple contexts, aligning better with human perception.

Contribution

The paper presents a new RL-based approach with a specialized reward model and an efficient optimization method for enhancing visual consistency in image generation tasks.

Findings

01

PaCo-Reward improves alignment with human perception of consistency.

02

PaCo-GRPO achieves state-of-the-art consistency performance.

03

The framework enhances training efficiency and stability.

Abstract

Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. Supervised training approaches struggle with this task due to the lack of large-scale datasets capturing visual consistency and the complexity of modeling human perceptual preferences. In this paper, we argue that reinforcement learning (RL) offers a promising alternative by enabling models to learn complex and subjective visual criteria in a data-free manner. To achieve this, we introduce PaCo-RL, a comprehensive framework that combines a specialized consistency reward model with an efficient RL algorithm. The first component, PaCo-Reward, is a pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis