On the size of the neighborhoods of a word

Cedric Chauve; Louxin Zhang

arXiv:2505.13796·math.CO·September 5, 2025

On the size of the neighborhoods of a word

Cedric Chauve, Louxin Zhang

PDF

Open Access

TL;DR

This paper derives exact formulas and bounds for the sizes of neighborhoods of words under Levenshtein distance, aiding the analysis of approximate pattern matching algorithms.

Contribution

It provides exact formulas for unary words and new upper bounds for neighborhoods of arbitrary words, confirming a conjecture on maximum neighborhood size.

Findings

01

Exact formulas for unary word neighborhoods.

02

New upper bounds for neighborhoods of arbitrary words.

03

Proof of a conjectured upper bound.

Abstract

The d-neighborhood of a word W in the Levenshtein distance is the set of all words at distance at most d from W. Generating the neighborhood of a word W, or related sets of words such as the condensed neighborhood or the super-condensed neighborhood has applications in the design of approximate pattern matching algorithms. It follows that bounds on the maximum size of the neighborhood of words of a given length can be used in the complexity analysis of such approximate pattern matching algorithms. In this note, we present exact formulas for the size of the condensed and super condensed neighborhoods of a unary word, a novel upper bound for the maximum size of the condensed neighborhood of an arbitrary word of a given length, and we prove a conjectured upper bound again for the maximum size of the condensed neighborhood of an arbitrary word of a given length.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicssemigroups and automata theory · Algorithms and Data Compression · Natural Language Processing Techniques