ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation
Argentina Anna Rescigno, Eva Vanmassenhove, Johanna Monti

TL;DR
This paper introduces ConGA, a linguistically grounded framework for annotating gender in machine translation, addressing gender bias and inconsistency issues in translating between English and Italian.
Contribution
The paper presents a novel, detailed annotation scheme for gender in MT, along with a new dataset and benchmark to evaluate gender bias in translation systems.
Findings
Systematic masculine bias in current MT systems
Inconsistent feminine gender realization in translations
ConGA enables more accurate gender-aware evaluation
Abstract
Handling gender across languages remains a persistent challenge for Machine Translation (MT) and Large Language Models (LLMs), especially when translating from gender-neutral languages into morphologically gendered ones, such as English to Italian. English largely omits grammatical gender, while Italian requires explicit agreement across multiple grammatical categories. This asymmetry often leads MT systems to default to masculine forms, reinforcing bias and reducing translation accuracy. To address this issue, we present the Contextual Gender Annotation (ConGA) framework, a linguistically grounded set of guidelines for word-level gender annotation. The scheme distinguishes between semantic gender in English through three tags, Masculine (M), Feminine (F), and Ambiguous (A), and grammatical gender realisation in Italian (Masculine (M), Feminine (F)), combined with entity-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Topic Modeling
