The analysis of topological structure in data using persistent homology: applications to lexical word association networks
Matthew Pietrosanu

TL;DR
This paper introduces persistent homology as a novel topological data analysis tool for linguistic networks, demonstrating its ability to identify clusters and topological features in word association data, and comparing it with existing clustering methods.
Contribution
It presents a first-principles exposition of persistent homology and applies it to lexical word association networks, including enhancements for better clustering performance.
Findings
Persistent homology detects meaningful clusters in word association data.
Compared to Markov clustering, persistent homology shows competitive or superior clustering ability.
Methodological improvements increase the efficacy of persistent homology in linguistic applications.
Abstract
Persistent homology is a technique recently developed in algebraic and computational topology well-suited to analysing structure in complex, high-dimensional data. In this paper, we exposit the theory of persistent homology from first principles and detail a novel application of this method to the field of computational linguistics. Using this method, we search for clusters and other topological features among closely-associated words of the English language. Furthermore, we compare the clustering abilities of persistent homology and the commonly-used Markov clustering algorithm and discuss improvements to basic persistent homology techniques to increase its clustering efficacy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Homotopy and Cohomology in Algebraic Topology · Advanced Neuroimaging Techniques and Applications
