
TL;DR
This paper introduces a multi-scale string metric based on angle distances, providing theoretical properties, efficient algorithms, and benchmarking against existing methods.
Contribution
It proposes a novel weighted angle distance metric on strings, with proven properties, efficient computation, and empirical benchmarking against standard baselines.
Findings
The metric is robust under tandem-repeat stutters.
The proposed algorithm runs in linear time using suffix trees.
Benchmark results show competitive performance against edit and n-gram baselines.
Abstract
We define a multi-scale metric on strings by aggregating angle distances between all -gram count vectors with exponential weights . We benchmark in DBSCAN clustering against edit and -gram baselines, give a linear-time suffix-tree algorithm for evaluation, prove metric and stability properties (including robustness under tandem-repeat stutters), and characterize isometries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
