Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?
Rochelle Choenni, Ekaterina Shutova, Robert van Rooij

TL;DR
This paper explores the stereotypes encoded in pretrained language models, introduces a new dataset, and examines how stereotypes and associated emotions change through fine-tuning on different text sources.
Contribution
It presents the first dataset of social stereotypes for language models and a method to analyze stereotypes and emotions in an unsupervised manner.
Findings
Models encode varying stereotypes of social groups.
Stereotypes and emotions can shift rapidly during fine-tuning.
Fine-tuning on different sources affects model attitudes and emotional associations.
Abstract
In this paper, we investigate what types of stereotypical information are captured by pretrained language models. We present the first dataset comprising stereotypical attributes of a range of social groups and propose a method to elicit stereotypes encoded by pretrained language models in an unsupervised fashion. Moreover, we link the emergent stereotypes to their manifestation as basic emotions as a means to study their emotional effects in a more generalized manner. To demonstrate how our methods can be used to analyze emotion and stereotype shifts due to linguistic experience, we use fine-tuning on news sources as a case study. Our experiments expose how attitudes towards different social groups vary across models and how quickly emotions and stereotypes can shift at the fine-tuning stage.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Computational and Text Analysis Methods
