# Efficient Online String Matching Based on Characters Distance Text   Sampling

**Authors:** Simone Faro, Arianna Pavone, Francesco Pio Marino

arXiv: 1908.05930 · 2019-08-19

## TL;DR

This paper introduces a novel online string matching algorithm based on character distance sampling, achieving faster search times and lower space requirements compared to previous methods, with proven theoretical efficiency and practical speedup.

## Contribution

The paper presents a new character distance sampling algorithm for online string matching that improves speed and space efficiency, with proven theoretical and practical advantages.

## Key findings

- Achieves linear worst-case and optimal average-time complexity.
- Provides up to 9 times faster search in practice.
- Uses limited additional space, from 11% to 2.8% of text size.

## Abstract

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string matching is an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a given pivot character and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11% to 2.8% of the text size, with a gain up to 50% if compared with previous solutions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.05930/full.md

## Figures

21 figures with captions in the complete paper: https://tomesphere.com/paper/1908.05930/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1908.05930/full.md

---
Source: https://tomesphere.com/paper/1908.05930