Evaluating Nuanced Bias in Large Language Model Free Response Answers

Jennifer Healey; Laurie Byrum; Md Nadeem Akhtar; Moumita Sinha

arXiv:2407.08842·cs.CL·July 15, 2024

Evaluating Nuanced Bias in Large Language Model Free Response Answers

Jennifer Healey, Laurie Byrum, Md Nadeem Akhtar, Moumita Sinha

PDF

Open Access

TL;DR

This paper introduces a new method for detecting nuanced biases in free response answers generated by large language models, addressing limitations of existing bias benchmarks.

Contribution

It identifies four types of nuanced bias in free text and proposes a semi-automated pipeline with crowd evaluation for their detection.

Findings

01

Identified four nuanced bias types: confidence, implied, inclusion, erasure.

02

Developed a semi-automated bias detection pipeline.

03

Demonstrated improved bias detection in free responses.

Abstract

Pre-trained large language models (LLMs) can now be easily adapted for specific business purposes using custom prompts or fine tuning. These customizations are often iteratively re-engineered to improve some aspect of performance, but after each change businesses want to ensure that there has been no negative impact on the system's behavior around such critical issues as bias. Prior methods of benchmarking bias use techniques such as word masking and multiple choice questions to assess bias at scale, but these do not capture all of the nuanced types of bias that can occur in free response answers, the types of answers typically generated by LLM systems. In this paper, we identify several kinds of nuanced bias in free text that cannot be similarly identified by multiple choice tests. We describe these as: confidence bias, implied bias, inclusion bias and erasure bias. We present a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Software Testing and Debugging Techniques