Language discrimination and clustering via a neural network approach
Angelo Mariano, Giorgio Parisi, Saverio Pascazio

TL;DR
This paper employs neural networks to classify Indo-European languages from text, creating a language distance measure, constructing a dendrogram, and analyzing the language subgroup structure based on an entropic criterion.
Contribution
It introduces a neural network-based method to quantify language similarities and reveals hierarchical language groupings through dendrogram analysis.
Findings
Identification of four or five language subgroups
Neural network-derived language distance measure
Hierarchical clustering of Indo-European languages
Abstract
We classify twenty-one Indo-European languages starting from written text. We use neural networks in order to define a distance among different languages, construct a dendrogram and analyze the ultrametric structure that emerges. Four or five subgroups of languages are identified, according to the "cut" of the dendrogram, drawn with an entropic criterion. The results and the method are discussed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
