Evaluating Nuanced Bias in Large Language Model Free Response Answers
Jennifer Healey, Laurie Byrum, Md Nadeem Akhtar, Moumita Sinha

TL;DR
This paper introduces a new method for detecting nuanced biases in free response answers generated by large language models, addressing limitations of existing bias benchmarks.
Contribution
It identifies four types of nuanced bias in free text and proposes a semi-automated pipeline with crowd evaluation for their detection.
Findings
Identified four nuanced bias types: confidence, implied, inclusion, erasure.
Developed a semi-automated bias detection pipeline.
Demonstrated improved bias detection in free responses.
Abstract
Pre-trained large language models (LLMs) can now be easily adapted for specific business purposes using custom prompts or fine tuning. These customizations are often iteratively re-engineered to improve some aspect of performance, but after each change businesses want to ensure that there has been no negative impact on the system's behavior around such critical issues as bias. Prior methods of benchmarking bias use techniques such as word masking and multiple choice questions to assess bias at scale, but these do not capture all of the nuanced types of bias that can occur in free response answers, the types of answers typically generated by LLM systems. In this paper, we identify several kinds of nuanced bias in free text that cannot be similarly identified by multiple choice tests. We describe these as: confidence bias, implied bias, inclusion bias and erasure bias. We present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Software Testing and Debugging Techniques
