Technology Beats Algorithms (in Exact String Matching)
Jorma Tarhio, Jan Holub, Emanuele Giaquinta

TL;DR
This paper demonstrates that a SIMD-optimized naive string matching algorithm outperforms many specialized algorithms for patterns up to 50 symbols, especially on modern CPUs with small alphabets.
Contribution
It introduces a SIMD-based naive string matching algorithm that leverages modern CPU instructions and optimized comparison order for improved performance.
Findings
The SIMD naive algorithm is fastest for patterns up to 50 symbols.
It performs well on small alphabets and longer patterns.
Optimal comparison order is based on symbol probability in text.
Abstract
More than 120 algorithms have been developed for exact string matching within the last 40 years. We show by experiments that the \naive{} algorithm exploiting SIMD instructions of modern CPUs (with symbols compared in a special order) is the fastest one for patterns of length up to about 50 symbols and extremely good for longer patterns and small alphabets. The algorithm compares 16 or 32 characters in parallel by applying SSE2 or AVX2 instructions, respectively. Moreover, it uses loop peeling to further speed up the searching phase. We tried several orders for comparisons of pattern symbols and the increasing order of their probabilities in the text was the best.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Natural Language Processing Techniques
