Misgendering and Assuming Gender in Machine Translation when Working with Low-Resource Languages
Sourojit Ghosh, Srishti Chatterjee

TL;DR
This paper examines gender-related errors in machine translation for low-resource languages, highlighting societal impacts and proposing solutions to improve linguistic representation and reduce harmful assumptions.
Contribution
It provides a case study on Bengali to illustrate gender inference issues in MT and discusses societal implications and potential solutions for low-resource languages.
Findings
Gender is often assumed in translations involving low-resource languages.
Errors can lead to linguistic erasure and societal harm.
Proposes solutions to empower low-resource languages in MT.
Abstract
This chapter focuses on gender-related errors in machine translation (MT) in the context of low-resource languages. We begin by explaining what low-resource languages are, examining the inseparable social and computational factors that create such linguistic hierarchies. We demonstrate through a case study of our mother tongue Bengali, a global language spoken by almost 300 million people but still classified as low-resource, how gender is assumed and inferred in translations to and from the high(est)-resource English when no such information is provided in source texts. We discuss the postcolonial and societal impacts of such errors leading to linguistic erasure and representational harms, and conclude by discussing potential solutions towards uplifting languages by providing them more agency in MT conversations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTranslation Studies and Practices · Gender Studies in Language · Text Readability and Simplification
