The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs
Aleix Sant, Carlos Escolano, Audrey Mash, Francesca De Luca, Fornaciari, Maite Melero

TL;DR
This study evaluates gender bias in machine translation using large language models and demonstrates that prompt engineering can significantly reduce bias, bringing LLMs closer to traditional NMT systems in fairness.
Contribution
The paper introduces prompt engineering techniques to mitigate gender bias in LLM-based machine translation, achieving up to 12% bias reduction on benchmark datasets.
Findings
Gender bias is prevalent across all tested models.
Prompt engineering can reduce gender bias by up to 12%.
Bias gap between LLMs and NMT systems is significantly decreased.
Abstract
This paper studies gender bias in machine translation through the lens of Large Language Models (LLMs). Four widely-used test sets are employed to benchmark various base LLMs, comparing their translation quality and gender bias against state-of-the-art Neural Machine Translation (NMT) models for English to Catalan (En Ca) and English to Spanish (En Es) translation directions. Our findings reveal pervasive gender bias across all models, with base LLMs exhibiting a higher degree of bias compared to NMT models. To combat this bias, we explore prompting engineering techniques applied to an instruction-tuned LLM. We identify a prompt structure that significantly reduces gender bias by up to 12% on the WinoMT evaluation dataset compared to more straightforward prompts. These results significantly reduce the gender bias accuracy gap between LLMs and traditional NMT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEducational Technology and Optimization · Knowledge Management and Sharing · Innovative Teaching and Learning Methods
MethodsBalanced Selection
