Bayesian Evaluation of Large Language Model Behavior
Rachel Longjohn, Shang Wu, Saatvik Kher, Catarina Bel\'em, Padhraic Smyth

TL;DR
This paper introduces a Bayesian method to quantify uncertainty in evaluating large language models' behavior, addressing limitations of traditional binary assessment approaches and providing more nuanced insights.
Contribution
It presents a Bayesian framework for uncertainty quantification in LLM evaluation metrics, with case studies on harmful response refusal rates and preference comparisons.
Findings
Bayesian approach effectively quantifies uncertainty in LLM evaluations.
Uncertainty estimates improve understanding of model behavior.
Method applied successfully to adversarial and preference benchmarks.
Abstract
It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts provided to the LLM, where the output for each prompt may be assessed in a binary fashion (e.g., harmful/non-harmful or does not leak/leaks sensitive information), and the aggregation of binary scores is used to evaluate the LLM. However, existing approaches to evaluation often neglect statistical uncertainty quantification. With an applied statistics audience in mind, we provide background on LLM text generation and evaluation, and then describe a Bayesian approach for quantifying uncertainty in binary evaluation metrics. We focus in particular on uncertainty that is induced by the probabilistic text generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Natural Language Processing Techniques
