Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary   Empirical Study

Mingyang Song; Mao Zheng; Xuan Luo; Yue Pan

arXiv:2406.11629·cs.CL·February 6, 2025

Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study

Mingyang Song, Mao Zheng, Xuan Luo, Yue Pan

PDF

Open Access

TL;DR

This study explores how many-shot in-context learning prompts can improve the reliability of large language models when used as evaluators, showing that specific prompt designs enhance evaluation consistency and accuracy.

Contribution

The paper introduces two novel many-shot ICL prompt templates, MSwR and MSoR, to mitigate biases in LLM evaluators and demonstrates their effectiveness with GPT-4o.

Findings

01

GPT-4o performs better in many-shot regimes.

02

MSwR prompts outperform MSoR in evaluation tasks.

03

Increasing in-context examples improves evaluation quality.

Abstract

Utilizing Large Language Models (LLMs) as evaluators to assess the performance of LLMs has garnered attention. However, this kind of evaluation approach is affected by potential biases within LLMs, raising concerns about the accuracy and reliability of the evaluation results of LLMs. To address this problem, we propose and study two many-shot In-Context Learning (ICL) prompt templates to help LLM evaluators mitigate potential biases: Many-Shot with Reference (MSwR) and Many-Shot without Reference (MSoR). Specifically, the former utilizes in-context examples with model-generated evaluation rationales as references, while the latter does not include these references. Using these prompt designs, we investigate the impact of increasing the number of in-context examples on the consistency and quality of the evaluation results. Experimental results show that advanced LLMs, such as GPT-4o,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations