Counteracts: Testing Stereotypical Representation in Pre-trained Language Models
Damin Zhang, Julia Rayz, Romila Pradhan

TL;DR
This paper investigates the presence of gender stereotypes in pre-trained language models using counterexamples, revealing their reliance on superficial cues and lack of deep understanding, which informs more neutral interaction strategies.
Contribution
It introduces a method using counterexamples to analyze stereotypical knowledge in PLMs, focusing on gender stereotypes and evaluating multiple models across various prompts.
Findings
PLMs show robustness against unrelated info and shallow cues.
PLMs lack interpretation based on meaning.
Findings inform neutral interaction with PLMs.
Abstract
Recently, language models have demonstrated strong performance on various natural language understanding tasks. Language models trained on large human-generated corpus encode not only a significant amount of human knowledge, but also the human stereotype. As more and more downstream tasks have integrated language models as part of the pipeline, it is necessary to understand the internal stereotypical representation in order to design the methods for mitigating the negative effects. In this paper, we use counterexamples to examine the internal stereotypical knowledge in pre-trained language models (PLMs) that can lead to stereotypical preference. We mainly focus on gender stereotypes, but the method can be extended to other types of stereotype. We evaluate 7 PLMs on 9 types of cloze-style prompt with different information and base knowledge. The results indicate that PLMs show a certain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsBalanced Selection
