Unified Personalized Reward Model for Vision Generation
Yibin Wang, Yuhang Zang, Feng Han, Jiazi Bu, Yujie Zhou, Cheng Jin, Jiaqi Wang

TL;DR
This paper introduces UnifiedReward-Flex, a personalized reward model for vision generation that interprets semantic intent, grounds visual evidence, and constructs hierarchical assessments to better align with human preferences.
Contribution
It proposes a novel personalized reward model that incorporates flexible, context-aware reasoning and hierarchical assessment for improved visual content evaluation.
Findings
Outperforms existing reward models in image and video synthesis tasks.
Enhances alignment with subjective human preferences.
Demonstrates superior reasoning fidelity and discriminative ability.
Abstract
Recent advancements in multimodal reward models (RMs) have significantly propelled the development of visual generation. Existing frameworks typically adopt Bradley-Terry-style preference modeling or leverage generative VLMs as judges, and subsequently optimize visual generation models via reinforcement learning. However, current RMs suffer from inherent limitations: they often follow a one-size-fits-all paradigm that assumes a monolithic preference distribution or relies on fixed evaluation rubrics. As a result, they are insensitive to content-specific visual cues, leading to systematic misalignment with subjective and context-dependent human preferences. To this end, inspired by human assessment, we propose UnifiedReward-Flex, a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning. Specifically, given a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗CodeGoat24/Wan2.1-T2V-14B-UnifiedReward-Flex-loramodel· 146 dl· ♡ 6146 dl♡ 6
- 🤗CodeGoat24/UnifiedReward-Flex-qwen3vl-2bmodel· 12 dl12 dl
- 🤗CodeGoat24/UnifiedReward-Flex-qwen3vl-4bmodel· 8 dl8 dl
- 🤗CodeGoat24/UnifiedReward-Flex-qwen3vl-8bmodel· 133 dl133 dl
- 🤗CodeGoat24/UnifiedReward-Flex-qwen3vl-32bmodel· 31 dl31 dl
- 🤗CodeGoat24/FLUX.1-dev-UnifiedReward-Flexmodel· 21 dl· ♡ 321 dl♡ 3
- 🤗CodeGoat24/FLUX.2-klein-base-9B-UnifiedReward-Flex-loramodel· 386 dl· ♡ 19386 dl♡ 19
- 🤗CodeGoat24/Wan2.2-T2V-A14B-UnifiedReward-Flex-loramodel· 268 dl· ♡ 12268 dl♡ 12
- 🤗CodeGoat24/UnifiedReward-Flex-qwen35-4bmodel· 23 dl23 dl
- 🤗CodeGoat24/UnifiedReward-Flex-qwen35-9bmodel· 87 dl87 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
