Towards detecting unanticipated bias in Large Language Models
Anna Kruspe

TL;DR
This paper investigates new methods to detect hidden, unanticipated biases in Large Language Models by leveraging Uncertainty Quantification and Explainable AI techniques to improve fairness and transparency.
Contribution
It introduces novel approaches focusing on Uncertainty Quantification and Explainable AI to identify subtle biases in LLMs that are difficult to detect with existing methods.
Findings
Uncertainty measures can reveal biased model behaviors.
Explainability techniques help uncover hidden biases.
Proposed methods improve bias detection in LLMs.
Abstract
Over the last year, Large Language Models (LLMs) like ChatGPT have become widely available and have exhibited fairness issues similar to those in previous machine learning systems. Current research is primarily focused on analyzing and quantifying these biases in training data and their impact on the decisions of these models, alongside developing mitigation strategies. This research largely targets well-known biases related to gender, race, ethnicity, and language. However, it is clear that LLMs are also affected by other, less obvious implicit biases. The complex and often opaque nature of these models makes detecting such biases challenging, yet this is crucial due to their potential negative impact in various applications. In this paper, we explore new avenues for detecting these unanticipated biases in LLMs, focusing specifically on Uncertainty Quantification and Explainable AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
