Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models
Rebekka G\"orge, Michael Mock, H\'ector Allende-Cid

TL;DR
This paper introduces a linguistically grounded method for detecting stereotypes in language using large language models, emphasizing interpretability and empirical validation.
Contribution
It proposes a novel approach based on sociolinguistic indicators and demonstrates how LLMs can be instructed to identify and quantify stereotypes in sentences.
Findings
Models detect linguistic indicators well but struggle with behaviors and characteristics.
Performance improves with more few-shot examples and larger model size.
GPT-4 and Llama-3.3-70B-Instruct outperform smaller models.
Abstract
Social categories and stereotypes are embedded in language and can introduce data bias into Large Language Models (LLMs). Despite safeguards, these biases often persist in model behavior, potentially leading to representational harm in outputs. While sociolinguistic research provides valuable insights into the formation of stereotypes, NLP approaches for stereotype detection rarely draw on this foundation and often lack objectivity, precision, and interpretability. To fill this gap, in this work we propose a new approach that detects and quantifies the linguistic indicators of stereotypes in a sentence. We derive linguistic indicators from the Social Category and Stereotype Communication (SCSC) framework which indicate strong social category formulation and stereotyping in language, and use them to build a categorization scheme. To automate this approach, we instruct different LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Hate Speech and Cyberbullying Detection · Topic Modeling
