MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

Mehul Agarwal; Aditya Aggarwal; Arnav Goel; Medha Hira; Anubha Gupta

arXiv:2604.18914·cs.CL·April 22, 2026

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

Mehul Agarwal, Aditya Aggarwal, Arnav Goel, Medha Hira, Anubha Gupta

PDF

1 Datasets

TL;DR

MORPHOGEN is a new multilingual benchmark dataset designed to evaluate large language models' ability to handle gender-aware morphological transformations in French, Arabic, and Hindi.

Contribution

It introduces a synthetic dataset and benchmark for assessing gender-aware morphological generation in multilingual LLMs across diverse languages.

Findings

01

Significant gaps in current models' gender handling capabilities.

02

Insights into how models perform gender transformations in morphologically rich languages.

Abstract

While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchmark dataset for evaluating gender-aware generation in three typologically diverse grammatically gendered languages: French, Arabic, and Hindi. The core task, GENFORM, requires models to rewrite a first-person sentence in the opposite gender while preserving its meaning and structure. We construct a high-quality synthetic dataset spanning these three languages and benchmark 15 popular multilingual LLMs (2B-70B) on their ability to perform this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ag2003/morphogen
dataset· 224 dl
224 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.