Approximating Human Preferences Using a Multi-Judge Learned System

Eit\'an Sprejer; Fernando Avalos; Augusto Bernardi; Jose Pedro Brito de Azevedo Faustino; Jacob Haimes; Narmeen Fatimah Oozeer

arXiv:2510.25884·cs.AI·October 31, 2025

Approximating Human Preferences Using a Multi-Judge Learned System

Eit\'an Sprejer, Fernando Avalos, Augusto Bernardi, Jose Pedro Brito de Azevedo Faustino, Jacob Haimes, Narmeen Fatimah Oozeer

PDF

TL;DR

This paper introduces a framework that models diverse human preferences by aggregating outputs from multiple judges conditioned on different rubrics, improving alignment of language models with human values.

Contribution

It presents a novel persona-based aggregation method for preferences and implements two models, GAM and MLP, to enhance reward modeling in LLMs.

Findings

01

Improved alignment with human preferences.

02

Robustness against judge biases.

03

Effective aggregation of diverse judgments.

Abstract

Aligning LLM-based judges with human preferences is a significant challenge, as they are difficult to calibrate and often suffer from rubric sensitivity, bias, and instability. Overcoming this challenge advances key applications, such as creating reliable reward models for Reinforcement Learning from Human Feedback (RLHF) and building effective routing systems that select the best-suited model for a given user query. In this work, we propose a framework for modeling diverse, persona-based preferences by learning to aggregate outputs from multiple rubric-conditioned judges. We investigate the performance of this approach against naive baselines and assess its robustness through case studies on both human and LLM-judges biases. Our primary contributions include a persona-based method for synthesizing preference labels at scale and two distinct implementations of our aggregator:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.