A Note on Local Ultrametricity in Text
Fionn Murtagh

TL;DR
This paper investigates the presence of local ultrametric structures in textual data to identify unique hierarchical fingerprints for texts and distinguish between different domains, leveraging large collections of texts.
Contribution
It introduces a method to analyze local ultrametricity in texts, aiming to find unique hierarchical signatures and discriminate between different text domains.
Findings
Identification of local ultrametric structures in large text collections
Potential for text fingerprinting and domain discrimination
Insights into hierarchical organization of textual data
Abstract
High dimensional, sparsely populated data spaces have been characterized in terms of ultrametric topology. This implies that there are natural, not necessarily unique, tree or hierarchy structures defined by the ultrametric topology. In this note we study the extent of local ultrametric topology in texts, with the aim of finding unique ``fingerprints'' for a text or corpus, discriminating between texts from different domains, and opening up the possibility of exploiting hierarchical structures in the data. We use coherent and meaningful collections of over 1000 texts, comprising over 1.3 million words.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsadvanced mathematical theories · Mathematical Dynamics and Fractals · Mathematical and Theoretical Analysis
