RubricRL: Simple Generalizable Rewards for Text-to-Image Generation

Xuelu Feng; Yunsheng Li; Ziyu Wan; Zixuan Gao; Junsong Yuan; Dongdong Chen; Chunming Qiao

arXiv:2511.20651·cs.CV·November 26, 2025

RubricRL: Simple Generalizable Rewards for Text-to-Image Generation

Xuelu Feng, Yunsheng Li, Ziyu Wan, Zixuan Gao, Junsong Yuan, Dongdong Chen, Chunming Qiao

PDF

Open Access

TL;DR

RubricRL introduces a flexible, interpretable reward framework for text-to-image models that constructs structured, prompt-specific rubrics evaluated by multimodal judges, enhancing alignment with human preferences.

Contribution

It presents a novel rubric-based reward system that improves interpretability, modularity, and user control in reinforcement learning for text-to-image generation.

Findings

01

Improves prompt faithfulness and visual detail.

02

Enhances model generalizability.

03

Offers a flexible, user-adjustable reward mechanism.

Abstract

Reinforcement learning (RL) has recently emerged as a promising approach for aligning text-to-image generative models with human preferences. A key challenge, however, lies in designing effective and interpretable rewards. Existing methods often rely on either composite metrics (e.g., CLIP, OCR, and realism scores) with fixed weights or a single scalar reward distilled from human preference models, which can limit interpretability and flexibility. We propose RubricRL, a simple and general framework for rubric-based reward design that offers greater interpretability, composability, and user control. Instead of using a black-box scalar signal, RubricRL dynamically constructs a structured rubric for each prompt--a decomposable checklist of fine-grained visual criteria such as object correctness, attribute accuracy, OCR fidelity, and realism--tailored to the input text. Each criterion is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship