Bias Association Discovery Framework for Open-Ended LLM Generations
Jinhao Pan, Chahat Raj, Ziwei Zhu

TL;DR
The paper introduces BADF, a framework for discovering both known and novel social biases in open-ended LLM outputs by systematically extracting associations between identities and concepts, enhancing bias detection.
Contribution
It presents a novel systematic method, BADF, for uncovering unrecognized bias associations in LLMs beyond predefined identity-concept pairs.
Findings
BADF successfully identifies known biases in multiple models.
The framework uncovers previously unrecognized bias associations.
Experiments demonstrate BADF's scalability across diverse contexts.
Abstract
Social biases embedded in Large Language Models (LLMs) raise critical concerns, resulting in representational harms -- unfair or distorted portrayals of demographic groups -- that may be expressed in subtle ways through generated language. Existing evaluation methods often depend on predefined identity-concept associations, limiting their ability to surface new or unexpected forms of bias. In this work, we present the Bias Association Discovery Framework (BADF), a systematic approach for extracting both known and previously unrecognized associations between demographic identities and descriptive concepts from open-ended LLM outputs. Through comprehensive experiments spanning multiple models and diverse real-world contexts, BADF enables robust mapping and analysis of the varied concepts that characterize demographic identities. Our findings advance the understanding of biases in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Machine Learning in Healthcare
