On The Role of Reasoning in the Identification of Subtle Stereotypes in Natural Language
Jacob-Junqi Tian, Omkar Dige, D. B. Emerson, Faiza Khan Khattak

TL;DR
This paper emphasizes the importance of reasoning, especially multi-step reasoning, in improving the accuracy and interpretability of zero-shot stereotype detection in large language models, advancing bias mitigation efforts.
Contribution
It demonstrates that reasoning significantly enhances stereotype identification accuracy and interpretability in open-source LLMs, establishing reasoning as essential for bias detection.
Findings
Reasoning improves stereotype detection accuracy.
Multi-step reasoning enhances model interpretability.
Scaling models with reasoning yields better bias identification.
Abstract
Large language models (LLMs) are trained on vast, uncurated datasets that contain various forms of biases and language reinforcing harmful stereotypes that may be subsequently inherited by the models themselves. Therefore, it is essential to examine and address biases in language models, integrating fairness into their development to ensure that these models do not perpetuate social biases. In this work, we demonstrate the importance of reasoning in zero-shot stereotype identification across several open-source LLMs. Accurate identification of stereotypical language is a complex task requiring a nuanced understanding of social structures, biases, and existing unfair generalizations about particular groups. While improved accuracy is observed through model scaling, the use of reasoning, especially multi-step reasoning, is crucial to consistent performance. Additionally, through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Hate Speech and Cyberbullying Detection · Topic Modeling
