On The Role of Reasoning in the Identification of Subtle Stereotypes in   Natural Language

Jacob-Junqi Tian; Omkar Dige; D. B. Emerson; Faiza Khan Khattak

arXiv:2308.00071·cs.CL·October 1, 2024·2 cites

On The Role of Reasoning in the Identification of Subtle Stereotypes in Natural Language

Jacob-Junqi Tian, Omkar Dige, D. B. Emerson, Faiza Khan Khattak

PDF

Open Access

TL;DR

This paper emphasizes the importance of reasoning, especially multi-step reasoning, in improving the accuracy and interpretability of zero-shot stereotype detection in large language models, advancing bias mitigation efforts.

Contribution

It demonstrates that reasoning significantly enhances stereotype identification accuracy and interpretability in open-source LLMs, establishing reasoning as essential for bias detection.

Findings

01

Reasoning improves stereotype detection accuracy.

02

Multi-step reasoning enhances model interpretability.

03

Scaling models with reasoning yields better bias identification.

Abstract

Large language models (LLMs) are trained on vast, uncurated datasets that contain various forms of biases and language reinforcing harmful stereotypes that may be subsequently inherited by the models themselves. Therefore, it is essential to examine and address biases in language models, integrating fairness into their development to ensure that these models do not perpetuate social biases. In this work, we demonstrate the importance of reasoning in zero-shot stereotype identification across several open-source LLMs. Accurate identification of stereotypical language is a complex task requiring a nuanced understanding of social structures, biases, and existing unfair generalizations about particular groups. While improved accuracy is observed through model scaling, the use of reasoning, especially multi-step reasoning, is crucial to consistent performance. Additionally, through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Hate Speech and Cyberbullying Detection · Topic Modeling