Measures of string similarities based on the Hamming distance
Bojan Nikoli\'c, Boris \v{S}obot

TL;DR
This paper introduces a new similarity measure for string sets using Hamming distance and persistent homology, capturing both overlap and qualitative features of homological structures.
Contribution
It presents a novel similarity measure based on barcode comparison and introduces the separation of simplex radii technique for enhanced homological feature matching.
Findings
The new measure effectively captures qualitative homological features.
The separation of simplex radii improves string set comparison.
Method demonstrates robustness in similarity assessment.
Abstract
In this paper we consider measures of similarity between two sets of strings built up using the Hamming distance and tools of persistence homology as a basis. First we describe the construction of the \v Cech filtration adjoined to the set of strings, the persistence module corresponding to this filtration and its barcode structure. Using these means, we introduce a novel similarity measure for two sets of strings, based on a comparison of bars within their barcodes of the same dimension. Our idea is to look for a comparison that will take under consideration not only the overlap of bars, but also ensure that observed bars are qualitatively matched, in the sense that they represent similar homological features. To make this idea happen, we developed a method called the separation of simplex radii technique.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis
