Linear Approximate Pattern Matching Algorithm

Anas Al-okaily; Abdelghani Tbakhi

arXiv:2110.13802·cs.DS·July 1, 2022

Linear Approximate Pattern Matching Algorithm

Anas Al-okaily, Abdelghani Tbakhi

PDF

Open Access

TL;DR

This paper introduces a new linear-time, linear-space data structure for approximate pattern matching that significantly improves search efficiency by handling mismatches, insertions, and deletions.

Contribution

The paper presents a novel data structure that enables approximate pattern matching in linear time and space, with improved search complexity.

Findings

01

Achieved linear-time construction of the data structure.

02

Provided approximate matching with sublinear search costs.

03

Demonstrated practical efficiency for large data streams.

Abstract

Pattern matching is a fundamental process in almost every scientific domain. The problem involves finding the positions of a given pattern (usually of short length) in a reference stream of data (usually of large length). The matching can be an exact or as an approximate (inexact). Exact matching is to search for the pattern without allowing for mismatches (or insertions and deletions) of one or more characters in the pattern), while approximate matching is the opposite. For exact matching, several data structures that can be built in linear time and space are used and in practice nowadays. For approximate matching, the solutions proposed to solve this matching are non-linear and currently impractical. In this paper, we designed and implemented a structure that can be built in linear time and space ( $O (n)$ ) and solves the approximate matching problem in $O(m + \frac {log_2n {(log_\Sigma…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · DNA and Biological Computing