Deep Language Geometry: Constructing a Metric Space from LLM Weights

Maksym Shamrai; Vladyslav Hamolia

arXiv:2508.11676·cs.CL·August 19, 2025

Deep Language Geometry: Constructing a Metric Space from LLM Weights

Maksym Shamrai, Vladyslav Hamolia

PDF

Open Access 1 Datasets

TL;DR

This paper presents a new method to construct a metric space of languages using internal weights of large language models, revealing linguistic relationships and evolution.

Contribution

It introduces a novel framework that derives language representations from LLM weights, capturing intrinsic linguistic features without manual feature engineering.

Findings

01

Aligns with known linguistic families

02

Reveals unexpected inter-language connections

03

Applicable across diverse datasets and multilingual LLMs

Abstract

We introduce a novel framework that utilizes the internal weight activations of modern Large Language Models (LLMs) to construct a metric space of languages. Unlike traditional approaches based on hand-crafted linguistic features, our method automatically derives high-dimensional vector representations by computing weight importance scores via an adapted pruning algorithm. Our approach captures intrinsic language characteristics that reflect linguistic phenomena. We validate our approach across diverse datasets and multilingual LLMs, covering 106 languages. The results align well with established linguistic families while also revealing unexpected inter-language connections that may indicate historical contact or language evolution. The source code, computed language latent vectors, and visualization tool are made publicly available at https://github.com/mshamrai/deep-language-geometry.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

mshamrai/language-metric-data
dataset· 242 dl
242 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques