Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings
Carolin M. Schuster, Maria-Alexandra Dinisor, Shashwat Ghatiwala and, Georg Groh

TL;DR
This paper introduces a method for profiling biases in large language models by analyzing stereotype dimensions in contextual embeddings, providing intuitive visualizations to better understand and communicate model biases.
Contribution
It proposes bias profiles based on social psychology dictionaries to systematically analyze and visualize gender bias across multiple LLMs and contexts.
Findings
Gender bias varies across layers and contexts in LLMs.
Bias profiles effectively reveal and compare model stereotypes.
Profiles facilitate understanding and mitigation of biases.
Abstract
Large language models (LLMs) are the foundation of the current successes of artificial intelligence (AI), however, they are unavoidably biased. To effectively communicate the risks and encourage mitigation efforts these models need adequate and intuitive descriptions of their discriminatory properties, appropriate for all audiences of AI. We suggest bias profiles with respect to stereotype dimensions based on dictionaries from social psychology research. Along these dimensions we investigate gender bias in contextual embeddings, across contexts and layers, and generate stereotype profiles for twelve different LLMs, demonstrating their intuition and use case for exposing and visualizing bias.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Hate Speech and Cyberbullying Detection
