Can LLM be a Personalized Judge?

Yijiang River Dong; Tiancheng Hu; Nigel Collier

arXiv:2406.11657·cs.CL·June 18, 2024·2 cites

Can LLM be a Personalized Judge?

Yijiang River Dong, Tiancheng Hu, Nigel Collier

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper critically examines the reliability of using LLMs as personalized judges for user preferences, introduces uncertainty estimation to improve judgment accuracy, and demonstrates promising results comparable to human evaluation.

Contribution

It reveals limitations of current LLM-as-a-Judge methods and proposes a certainty-aware approach that significantly enhances judgment reliability and agreement with human ground truth.

Findings

01

Low agreement of LLM-as-a-Judge with human ground truth

02

Verbal uncertainty estimation improves judgment accuracy

03

Achieves over 80% agreement on high-certainty samples

Abstract

Ensuring that large language models (LLMs) reflect diverse user values and preferences is crucial as their user bases expand globally. It is therefore encouraging to see the growing interest in LLM personalization within the research community. However, current works often rely on the LLM-as-a-Judge approach for evaluation without thoroughly examining its validity. In this paper, we investigate the reliability of LLM-as-a-Personalized-Judge, asking LLMs to judge user preferences based on personas. Our findings suggest that directly applying LLM-as-a-Personalized-Judge is less reliable than previously assumed, showing low and inconsistent agreement with human ground truth. The personas typically used are often overly simplistic, resulting in low predictive power. To address these issues, we introduce verbal uncertainty estimation into the LLM-as-a-Personalized-Judge pipeline, allowing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dong-river/personalized-judge
pytorchOfficial

Models

🤗
chelleboyer/llm-mm-good-eb8e3f60-56f2-4729-8934-2428ca568d27
model· 1 dl
1 dl

Videos

Can LLM be a Personalized Judge?· underline

Taxonomy

TopicsLegal Education and Practice Innovations · Legal Systems and Judicial Processes · Judicial and Constitutional Studies