Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM   Outputs with Human Preferences

Shreya Shankar; J.D. Zamfirescu-Pereira; Bj\"orn Hartmann; Aditya G.; Parameswaran; Ian Arawjo

arXiv:2404.12272·cs.HC·April 19, 2024·1 cites

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

Shreya Shankar, J.D. Zamfirescu-Pereira, Bj\"orn Hartmann, Aditya G., Parameswaran, Ian Arawjo

PDF

Open Access

TL;DR

This paper introduces EvalGen, a mixed-initiative system that helps align LLM-generated evaluation functions with human preferences, addressing the challenges of subjective and iterative evaluation of LLM outputs.

Contribution

The paper presents EvalGen, an interface that automates the generation and validation of evaluation criteria, incorporating human feedback to improve alignment with human preferences.

Findings

01

EvalGen receives overall support but highlights subjectivity in alignment.

02

Identifies 'criteria drift' where grading outputs influences criteria.

03

Some evaluation criteria depend on specific outputs, challenging independence assumptions.

Abstract

Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators simply inherit all the problems of the LLMs they evaluate, requiring further human validation. We present a mixed-initiative approach to ``validate the validators'' -- aligning LLM-generated evaluation functions (be it prompts or code) with human requirements. Our interface, EvalGen, provides automated assistance to users in generating evaluation criteria and implementing assertions. While generating candidate implementations (Python functions, LLM grader prompts), EvalGen asks humans to grade a subset of LLM outputs; this feedback is used to select implementations that better align with user grades. A qualitative study finds overall support for EvalGen but underscores…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinancial Distress and Bankruptcy Prediction

MethodsALIGN