Local Contrastive Editing of Gender Stereotypes

Marlene Lutz; Rochelle Choenni; Markus Strohmaier; Anne Lauscher

arXiv:2410.17739·cs.CL·August 5, 2025

Local Contrastive Editing of Gender Stereotypes

Marlene Lutz, Rochelle Choenni, Markus Strohmaier, Anne Lauscher

PDF

Open Access

TL;DR

This paper presents a method called local contrastive editing to precisely identify and modify small subsets of weights in language models that encode gender stereotypes, improving understanding and control of bias.

Contribution

It introduces a novel local contrastive editing technique to localize and edit gender bias in language model parameters, enabling targeted bias mitigation.

Findings

01

Identifies < 0.5% of weights associated with gender stereotypes.

02

Demonstrates precise localization and control of gender bias in models.

03

Advances understanding of bias manifestation in model parameters.

Abstract

Stereotypical bias encoded in language models (LMs) poses a threat to safe language technology, yet our understanding of how bias manifests in the parameters of LMs remains incomplete. We introduce local contrastive editing that enables the localization and editing of a subset of weights in a target model in relation to a reference model. We deploy this approach to identify and modify subsets of weights that are associated with gender stereotypes in LMs. Through a series of experiments, we demonstrate that local contrastive editing can precisely localize and control a small subset (< 0.5%) of weights that encode gender bias. Our work (i) advances our understanding of how stereotypical biases can manifest in the parameter space of LMs and (ii) opens up new avenues for developing parameter-efficient strategies for controlling model properties in a contrastive manner.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security