Sampling the Swadesh List to Identify Similar Languages with Tree Spaces

Garett Ordway; Vic Patrangenaru

arXiv:2405.06549·stat.AP·May 13, 2024

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces

Garett Ordway, Vic Patrangenaru

PDF

Open Access 1 Repo

TL;DR

This paper explores a novel method for analyzing language relationships using simplified tree spaces and clustering techniques based on Swadesh list data, aiming to identify language ancestry and similarities.

Contribution

It introduces a new approach combining open book data analysis, 3-spider tree spaces, and single linkage clustering to study language relationships from Swadesh lists.

Findings

01

Identified non-sticky and sticky sample means indicating different ancestral relationships.

02

Demonstrated the use of 3-spider tree spaces for language clustering.

03

Provided initial results on language ancestry inference.

Abstract

Communication plays a vital role in human interaction. Studying language is a worthwhile task and more recently has become quantitative in nature with developments of fields like quantitative comparative linguistics and lexicostatistics. With respect to the authors own native languages, the ancestry of the English language and the Latin alphabet are of the primary interest. The Indo-European Tree traces many modern languages back to the Proto-Indo-European root. Swadesh's cognates played a large role in developing that historical perspective where some of the primary branches are Germanic, Celtic, Italic, and Balto-Slavic. This paper will use data analysis on open books where the simplest singular space is the 3-spider - a union T3 of three rays with their endpoints glued at a point 0 - which can represent these tree spaces for language clustering. These trees are built using a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GarettO9/lang_tree
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · semigroups and automata theory