Merging Sorted Lists of Similar Strings

Gene Myers

arXiv:2208.09351·cs.DS·August 22, 2022

Merging Sorted Lists of Similar Strings

Gene Myers

PDF

1 Repo

TL;DR

This paper introduces new algorithms for merging multiple sorted string lists with many similar elements, achieving improved worst-case efficiency by leveraging auxiliary information and reducing character comparisons.

Contribution

The paper presents novel merging algorithms that exploit string similarity to improve efficiency, outperforming traditional heap-based methods in specific scenarios.

Findings

01

Algorithms achieve worst-case $O(M \, \log T + S)$ time complexity.

02

Methods become linear time when all input lists are identical.

03

Practical performance surpasses trie-based approaches.

Abstract

Merging $T$ sorted, non-redundant lists containing $M$ elements into a single sorted, non-redundant result of size $N \geq M / T$ is a classic problem typically solved practically in $O (M lo g T)$ time with a priority-queue data structure the most basic of which is the simple *heap*. We revisit this problem in the situation where the list elements are *strings* and the lists contain many *identical or nearly identical elements*. By keeping simple auxiliary information with each heap node, we devise an $O (M lo g T + S)$ worst-case method that performs no more character comparisons than the sum of the lengths of all the strings $S$ , and another $O (M lo g (T / \overset{e}{ˉ}) + S)$ method that becomes progressively more efficient as a function of the fraction of equal elements $\overset{e}{ˉ} = M / N$ between input lists, reaching linear time when the lists are all identical. The methods perform favorably in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thegenemyers/string.heap
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.