LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

Haitao Li; Qian Dong; Junjie Chen; Huixue Su; Yujia Zhou; Qingyao Ai,; Ziyi Ye; Yiqun Liu

arXiv:2412.05579·cs.CL·December 11, 2024·24 cites

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai,, Ziyi Ye, Yiqun Liu

PDF

Open Access 1 Repo

TL;DR

This survey comprehensively reviews the emerging paradigm of using Large Language Models as evaluators, covering their functionality, methodologies, applications, evaluation techniques, limitations, and future directions.

Contribution

It systematically defines LLMs-as-Judges, analyzes their methodologies, applications, and limitations, and discusses future research directions in this rapidly growing field.

Findings

01

LLMs demonstrate strong effectiveness and generalization in evaluation tasks.

02

Various methodologies exist for constructing LLM-based evaluation systems.

03

Identified limitations include biases and interpretability challenges.

Abstract

The rapid advancement of Large Language Models (LLMs) has driven their expanding application across various fields. One of the most promising applications is their role as evaluators based on natural language responses, referred to as ''LLMs-as-judges''. This framework has attracted growing attention from both academia and industry due to their excellent effectiveness, ability to generalize across tasks, and interpretability in the form of natural language. This paper presents a comprehensive survey of the LLMs-as-judges paradigm from five key perspectives: Functionality, Methodology, Applications, Meta-evaluation, and Limitations. We begin by providing a systematic definition of LLMs-as-Judges and introduce their functionality (Why use LLM judges?). Then we address methodology to construct an evaluation system with LLMs (How to use LLM judges?). Additionally, we investigate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cshaitao/awesome-llms-as-judges
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLegal Education and Practice Innovations · Artificial Intelligence in Law · Law, AI, and Intellectual Property

MethodsSoftmax · Attention Is All You Need