Exploring Language Similarities with Dimensionality Reduction Technique
Sangarshanan Veeraraghavan

TL;DR
This paper investigates the similarities among various languages by applying dimensionality reduction to visualize their relationships, aiming to improve language modeling and translation for less-studied languages.
Contribution
It introduces a method to represent multiple languages in a lower-dimensional space to visualize their similarities and aid in developing better language models.
Findings
Languages can be effectively visualized in 2D to reveal their similarities.
The approach can assist in understanding and modeling lesser-known languages.
Dimensionality reduction helps leverage existing models for new languages.
Abstract
In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Text and Document Classification Technologies
