An exploration of the encoding of grammatical gender in word embeddings
Hartger Veeman, Ali Basirat

TL;DR
This paper investigates how grammatical gender of nouns is encoded in word embeddings across different languages, revealing that morphological features and context influence the encoding and classification accuracy.
Contribution
It compares various word embeddings for encoding grammatical gender and highlights the impact of morphological features and context on this encoding.
Findings
Gender encoding overlaps across Swedish, Danish, and Dutch embeddings.
Adding contextual information reduces classifier performance.
Removing articles from training data significantly decreases classification accuracy.
Abstract
The vector representation of words, known as word embeddings, has opened a new research approach in linguistic studies. These representations can capture different types of information about words. The grammatical gender of nouns is a typical classification of nouns based on their formal and semantic properties. The study of grammatical gender based on word embeddings can give insight into discussions on how grammatical genders are determined. In this study, we compare different sets of word embeddings according to the accuracy of a neural classifier determining the grammatical gender of nouns. It is found that there is an overlap in how grammatical gender is encoded in Swedish, Danish, and Dutch embeddings. Our experimental results on the contextualized embeddings pointed out that adding more contextual information to embeddings is detrimental to the classifier's performance. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
