Unified Personalized Reward Model for Vision Generation

Yibin Wang; Yuhang Zang; Feng Han; Jiazi Bu; Yujie Zhou; Cheng Jin; Jiaqi Wang

arXiv:2602.02380·cs.CV·February 11, 2026

Unified Personalized Reward Model for Vision Generation

Yibin Wang, Yuhang Zang, Feng Han, Jiazi Bu, Yujie Zhou, Cheng Jin, Jiaqi Wang

PDF

Open Access 10 Models 1 Datasets

TL;DR

This paper introduces UnifiedReward-Flex, a personalized reward model for vision generation that interprets semantic intent, grounds visual evidence, and constructs hierarchical assessments to better align with human preferences.

Contribution

It proposes a novel personalized reward model that incorporates flexible, context-aware reasoning and hierarchical assessment for improved visual content evaluation.

Findings

01

Outperforms existing reward models in image and video synthesis tasks.

02

Enhances alignment with subjective human preferences.

03

Demonstrates superior reasoning fidelity and discriminative ability.

Abstract

Recent advancements in multimodal reward models (RMs) have significantly propelled the development of visual generation. Existing frameworks typically adopt Bradley-Terry-style preference modeling or leverage generative VLMs as judges, and subsequently optimize visual generation models via reinforcement learning. However, current RMs suffer from inherent limitations: they often follow a one-size-fits-all paradigm that assumes a monolithic preference distribution or relies on fixed evaluation rubrics. As a result, they are insensitive to content-specific visual cues, leading to systematic misalignment with subjective and context-dependent human preferences. To this end, inspired by human assessment, we propose UnifiedReward-Flex, a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning. Specifically, given a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

CodeGoat24/UnifiedReward-Flex-SFT-90K
dataset· 137 dl
137 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition