MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation
Mehul Agarwal, Aditya Aggarwal, Arnav Goel, Medha Hira, Anubha Gupta

TL;DR
MORPHOGEN is a new multilingual benchmark dataset designed to evaluate large language models' ability to handle gender-aware morphological transformations in French, Arabic, and Hindi.
Contribution
It introduces a synthetic dataset and benchmark for assessing gender-aware morphological generation in multilingual LLMs across diverse languages.
Findings
Significant gaps in current models' gender handling capabilities.
Insights into how models perform gender transformations in morphologically rich languages.
Abstract
While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchmark dataset for evaluating gender-aware generation in three typologically diverse grammatically gendered languages: French, Arabic, and Hindi. The core task, GENFORM, requires models to rewrite a first-person sentence in the opposite gender while preserving its meaning and structure. We construct a high-quality synthetic dataset spanning these three languages and benchmark 15 popular multilingual LLMs (2B-70B) on their ability to perform this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
