Counteracts: Testing Stereotypical Representation in Pre-trained   Language Models

Damin Zhang; Julia Rayz; Romila Pradhan

arXiv:2301.04347·cs.CL·April 10, 2023

Counteracts: Testing Stereotypical Representation in Pre-trained Language Models

Damin Zhang, Julia Rayz, Romila Pradhan

PDF

Open Access

TL;DR

This paper investigates the presence of gender stereotypes in pre-trained language models using counterexamples, revealing their reliance on superficial cues and lack of deep understanding, which informs more neutral interaction strategies.

Contribution

It introduces a method using counterexamples to analyze stereotypical knowledge in PLMs, focusing on gender stereotypes and evaluating multiple models across various prompts.

Findings

01

PLMs show robustness against unrelated info and shallow cues.

02

PLMs lack interpretation based on meaning.

03

Findings inform neutral interaction with PLMs.

Abstract

Recently, language models have demonstrated strong performance on various natural language understanding tasks. Language models trained on large human-generated corpus encode not only a significant amount of human knowledge, but also the human stereotype. As more and more downstream tasks have integrated language models as part of the pipeline, it is necessary to understand the internal stereotypical representation in order to design the methods for mitigating the negative effects. In this paper, we use counterexamples to examine the internal stereotypical knowledge in pre-trained language models (PLMs) that can lead to stereotypical preference. We mainly focus on gender stereotypes, but the method can be extended to other types of stereotype. We evaluate 7 PLMs on 9 types of cloze-style prompt with different information and base knowledge. The results indicate that PLMs show a certain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsBalanced Selection