User-centric Subjective Leaderboard by Customizable Reward Modeling

Qi Jia; Xiujie Song; Zicheng Zhang; Yijin Guo; Kaiwei Zhang; Zijian Chen; Guangtao Zhai

arXiv:2508.09463·cs.CL·August 14, 2025

User-centric Subjective Leaderboard by Customizable Reward Modeling

Qi Jia, Xiujie Song, Zicheng Zhang, Yijin Guo, Kaiwei Zhang, Zijian Chen, Guangtao Zhai

PDF

TL;DR

This paper introduces a user-centric, dynamic leaderboard for LLMs based on subjective human preferences, utilizing customizable reward models that outperform larger models and handle preference diversity effectively.

Contribution

It presents the first subjective, preference-driven leaderboard and a lightweight CRM that generalizes well across diverse human preferences and scenarios.

Findings

01

CRM surpasses GPT-4.1 and Gemini-2.5-pro in performance

02

USL correlates negatively with contradictory preferences

03

Human preferences show significant diversity and contradictions

Abstract

Existing benchmarks for large language models (LLMs) predominantely focus on assessing their capabilities through verifiable tasks. Such objective and static benchmarks offer limited utility for practical LLM selection, making it difficult for users to find suitable models for their individual needs. To bridge this gap, we present the first User-Centric Subjective Leaderboard (USL), which provides a preference-driven, dynamic ranking of LLMs across diverse real-world scenarios. Our work is built upon a thorough investigation of real human preference data, involving more than 10K subjective queries. Our investigation reveals significant diversity and contradictions in human preferences, which limit the effectiveness of state-of-the-art reward models. To address this, we introduce Customizable Reward Models (CRMs). With only 4B parameters, our CRM surpasses the performance of leading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.