Geometric Patterns of Meaning: A PHATE Manifold Analysis of Multi-lingual Embeddings
Wen G Gong

TL;DR
This paper presents a multi-level framework using PHATE manifold learning to analyze the geometric structure of multilingual embeddings, revealing systematic patterns and limitations across linguistic levels and datasets.
Contribution
It introduces Semanscope, a visualization tool applying PHATE to uncover semantic geometry in multilingual embeddings at multiple linguistic levels.
Findings
Chinese radicals show geometric collapse, indicating model failure to distinguish semantics from structure.
Different writing systems exhibit distinct geometric signatures in embedding space.
Content words form semantic clustering, while numbers follow spiral trajectories, challenging standard assumptions.
Abstract
We introduce a multi-level analysis framework for examining semantic geometry in multilingual embeddings, implemented through Semanscope (a visualization tool that applies PHATE manifold learning across four linguistic levels). Analysis of diverse datasets spanning sub-character components, alphabetic systems, semantic domains, and numerical concepts reveals systematic geometric patterns and critical limitations in current embedding models. At the sub-character level, purely structural elements (Chinese radicals) exhibit geometric collapse, highlighting model failures to distinguish semantic from structural components. At the character level, different writing systems show distinct geometric signatures. At the word level, content words form clustering-branching patterns across 20 semantic domains in English, Chinese, and German. Arabic numbers organize through spiral trajectories rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Language and cultural evolution · Categorization, perception, and language
