What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models

Jani\c{c}a Hackenbuchner; Arda Tezcan; Joke Daems

arXiv:2512.08440·cs.CL·March 5, 2026

What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models

Jani\c{c}a Hackenbuchner, Arda Tezcan, Joke Daems

PDF

Open Access

TL;DR

This paper investigates the origins of gender bias in neural machine translation by analyzing which source tokens influence gender choices, using contrastive explanations and saliency attribution to compare model and human perceptions.

Contribution

It introduces a contrastive attribution method to identify source tokens influencing gender decisions in translation models, linking model behavior to human perceptions.

Findings

01

Salient words identified overlap with human perceptions

02

Contrastive attribution reveals key context influencing gender choices

03

Analysis highlights potential for bias mitigation strategies

Abstract

Interpretability can be implemented to understand decisions taken by (black box) models, such as neural machine translation (NMT) or large language models (LLMs). Yet, research in this area has been limited in relation to a manifested problem in these models: gender bias. In this work, we aim to move away from simply measuring bias to exploring its origins. Working with gender-ambiguous natural source data, this exploratory study examines which context, in the form of input tokens in the source sentence (EN), influences (or triggers) the NMT model's choice of a certain gender inflection in the target languages (DE/ES). To analyse this, we compute saliency attribution based on contrastive translations. We first address the challenge of the lack of a scoring threshold and specifically examine different attribution levels of source words on the model's gender decisions in the translation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education