# Language comparison via network topology

**Authors:** Bla\v{z} \v{S}krlj, Senja Pollak

arXiv: 1907.06944 · 2019-12-24

## TL;DR

This paper introduces a network-based approach to compare languages by representing textual data as directed, weighted networks and analyzing their topological properties to reveal linguistic similarities and differences.

## Contribution

It proposes a novel text2net algorithm for network representation of text and demonstrates how network topology metrics can be used for cross-lingual language comparison.

## Key findings

- Network community structure reflects known language differences.
- Method scales efficiently to large corpora.
- Network metrics reveal both known and novel linguistic insights.

## Abstract

Modeling relations between languages can offer understanding of language characteristics and uncover similarities and differences between languages. Automated methods applied to large textual corpora can be seen as opportunities for novel statistical studies of language development over time, as well as for improving cross-lingual natural language processing techniques. In this work, we first propose how to represent textual data as a directed, weighted network by the text2net algorithm. We next explore how various fast, network-topological metrics, such as network community structure, can be used for cross-lingual comparisons. In our experiments, we employ eight different network topology metrics, and empirically showcase on a parallel corpus, how the methods can be used for modeling the relations between nine selected languages. We demonstrate that the proposed method scales to large corpora consisting of hundreds of thousands of aligned sentences on an of-the-shelf laptop. We observe that on the one hand properties such as communities, capture some of the known differences between the languages, while others can be seen as novel opportunities for linguistic studies.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.06944/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1907.06944/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1907.06944/full.md

---
Source: https://tomesphere.com/paper/1907.06944