What do Language Representations Really Represent?
Johannes Bjerva, Robert \"Ostling, Maria Han Veiga, J\"org, Tiedemann, Isabelle Augenstein

TL;DR
This paper investigates what linguistic features are captured by language representations in neural models, revealing that structural similarities between languages are most strongly reflected, with implications for NLP and linguistic typology.
Contribution
The study provides a detailed analysis of the factors influencing language representations, highlighting the prominence of structural similarity over genetic relationships.
Findings
Structural similarity correlates most with language representation similarity.
Genetic relationships are confounded and less predictive of representation similarity.
Language representations capture structural features more than genetic or geographical factors.
Abstract
A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
