What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models
Jani\c{c}a Hackenbuchner, Arda Tezcan, Joke Daems

TL;DR
This paper investigates the origins of gender bias in neural machine translation by analyzing which source tokens influence gender choices, using contrastive explanations and saliency attribution to compare model and human perceptions.
Contribution
It introduces a contrastive attribution method to identify source tokens influencing gender decisions in translation models, linking model behavior to human perceptions.
Findings
Salient words identified overlap with human perceptions
Contrastive attribution reveals key context influencing gender choices
Analysis highlights potential for bias mitigation strategies
Abstract
Interpretability can be implemented to understand decisions taken by (black box) models, such as neural machine translation (NMT) or large language models (LLMs). Yet, research in this area has been limited in relation to a manifested problem in these models: gender bias. In this work, we aim to move away from simply measuring bias to exploring its origins. Working with gender-ambiguous natural source data, this exploratory study examines which context, in the form of input tokens in the source sentence (EN), influences (or triggers) the NMT model's choice of a certain gender inflection in the target languages (DE/ES). To analyse this, we compute saliency attribution based on contrastive translations. We first address the challenge of the lack of a scoring threshold and specifically examine different attribution levels of source words on the model's gender decisions in the translation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
