# Exploring Language Similarities with Dimensionality Reduction Technique

**Authors:** Sangarshanan Veeraraghavan

arXiv: 1902.06092 · 2019-02-19

## TL;DR

This paper investigates the similarities among various languages by applying dimensionality reduction to visualize their relationships, aiming to improve language modeling and translation for less-studied languages.

## Contribution

It introduces a method to represent multiple languages in a lower-dimensional space to visualize their similarities and aid in developing better language models.

## Key findings

- Languages can be effectively visualized in 2D to reveal their similarities.
- The approach can assist in understanding and modeling lesser-known languages.
- Dimensionality reduction helps leverage existing models for new languages.

## Abstract

In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages.

---
Source: https://tomesphere.com/paper/1902.06092