Quantifying the Semantic Core of Gender Systems
Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin and, Dami\'an Blasi, Hanna Wallach

TL;DR
This paper investigates whether grammatical gender assignments in inanimate nouns are arbitrary or systematically related to their lexical semantics across multiple languages, revealing significant correlations in 18 languages.
Contribution
It introduces a large-scale analysis using canonical correlation analysis to quantify the semantic basis of gender assignments in inanimate nouns across diverse languages.
Findings
18 languages show significant correlation between gender and semantics
Gender assignments are not entirely arbitrary but systematically related to lexical meaning
The study provides empirical evidence for semantic patterns in gender systems
Abstract
Many of the world's languages employ grammatical gender on the lexeme. For example, in Spanish, the word for 'house' (casa) is feminine, whereas the word for 'paper' (papel) is masculine. To a speaker of a genderless language, this assignment seems to exist with neither rhyme nor reason. But is the assignment of inanimate nouns to grammatical genders truly arbitrary? We present the first large-scale investigation of the arbitrariness of noun-gender assignments. To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics. We find that 18 languages exhibit a significant correlation between grammatical gender and lexical semantics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGender Studies in Language · Natural Language Processing Techniques · Linguistic Variation and Morphology
