Distance-to-Distance Ratio: A Similarity Measure for Sentences Based on Rate of Change in LLM Embeddings
Abdullah Qureshi, Kenneth Rice, and Alexander Wolpert

TL;DR
This paper introduces the distance-to-distance ratio (DDR), a new similarity measure for LLM sentence embeddings that better aligns with human perception by capturing the semantic influence of context through rate of change analysis.
Contribution
The paper presents DDR, a novel similarity metric inspired by Lipschitz continuity, which improves semantic discrimination in sentence embeddings compared to existing methods.
Findings
DDR outperforms existing similarity metrics in distinguishing semantic similarity.
DDR maintains robustness under minimal, controlled text perturbations.
DDR provides finer discrimination between similar and dissimilar texts.
Abstract
A measure of similarity between text embeddings can be considered adequate only if it adheres to the human perception of similarity between texts. In this paper, we introduce the distance-to-distance ratio (DDR), a novel measure of similarity between LLM sentence embeddings. Inspired by Lipschitz continuity, DDR measures the rate of change in similarity between the pre-context word embeddings and the similarity between post-context LLM embeddings, thus measuring the semantic influence of context. We evaluate the performance of DDR in experiments designed as a series of perturbations applied to sentences drawn from a sentence dataset. For each sentence, we generate variants by replacing one, two, or three words with either synonyms, which constitute semantically similar text, or randomly chosen words, which constitute semantically dissimilar text. We compare the performance of DDR with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Text Readability and Simplification
