TL;DR
This paper introduces new algorithms for merging multiple sorted string lists with many similar elements, achieving improved worst-case efficiency by leveraging auxiliary information and reducing character comparisons.
Contribution
The paper presents novel merging algorithms that exploit string similarity to improve efficiency, outperforming traditional heap-based methods in specific scenarios.
Findings
Algorithms achieve worst-case $O(M \, \log T + S)$ time complexity.
Methods become linear time when all input lists are identical.
Practical performance surpasses trie-based approaches.
Abstract
Merging sorted, non-redundant lists containing elements into a single sorted, non-redundant result of size is a classic problem typically solved practically in time with a priority-queue data structure the most basic of which is the simple *heap*. We revisit this problem in the situation where the list elements are *strings* and the lists contain many *identical or nearly identical elements*. By keeping simple auxiliary information with each heap node, we devise an worst-case method that performs no more character comparisons than the sum of the lengths of all the strings , and another method that becomes progressively more efficient as a function of the fraction of equal elements between input lists, reaching linear time when the lists are all identical. The methods perform favorably in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
