Optimal-Hash Exact String Matching Algorithms

Thierry Lecroq

arXiv:2303.05799·cs.DS·March 13, 2023·1 cites

Optimal-Hash Exact String Matching Algorithms

Thierry Lecroq

PDF

Open Access 1 Repo

TL;DR

This paper introduces optimized hash-based string matching algorithms that improve speed for short patterns on large alphabets by ensuring unique hash values for pattern q-grams.

Contribution

It presents a novel approach to select minimal q-gram lengths for hashing, enhancing the efficiency of existing string matching algorithms.

Findings

01

Faster matching for short patterns on large alphabets.

02

Unique hash values for pattern q-grams improve algorithm performance.

03

New algorithms outperform previous HASH family methods.

Abstract

String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing $q$ -grams. The improvement consists of considering minimal values $q$ such that each $q$ -grams of the pattern has a unique hash value. The new algorithms are fastest than algorithm of the HASH family for short patterns on large size alphabets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lecroq/ohash
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · DNA and Biological Computing