String Matching with Wildcards in the Massively Parallel Computation   Model

MohammadTaghi Hajiaghayi; Hamed Saleh; Saeed Seddighin; Xiaorui Sun

arXiv:1910.11829·cs.DC·June 8, 2021

String Matching with Wildcards in the Massively Parallel Computation Model

MohammadTaghi Hajiaghayi, Hamed Saleh, Saeed Seddighin, Xiaorui Sun

PDF

TL;DR

This paper develops efficient parallel algorithms within the MPC framework for string matching problems involving wildcards, addressing '?' (don't cares), '+' (repetitions), and '*' (any substring) with constant or logarithmic round complexity.

Contribution

It introduces novel MPC algorithms for string matching with wildcards, utilizing FFT and reductions, achieving constant or logarithmic rounds for different wildcard types.

Findings

01

Constant round MPC algorithm for '?' wildcard matching using FFT.

02

Constant round MPC algorithm for '+' wildcard matching via reduction from subset matching.

03

Logarithmic round algorithms for '*' wildcard matching in specific cases.

Abstract

We study distributed algorithms for string matching problem in presence of wildcard characters. Given a string T (a text), we look for all occurrences of another string P (a pattern) as a substring of string T . Each wildcard character in the pattern matches a specific class of strings based on its type. String matching is one of the most fundamental problems in computer science, especially in the fields of bioinformatics and machine learning. Persistent effort has led to a variety of algorithms for the problem since 1960s. With rise of big data and the inevitable demand to solve problems on huge data sets, there have been many attempts to adapt classic algorithms into the MPC framework to obtain further efficiency. MPC is a recent framework for parallel computation of big data, which is designed to capture the MapReduce-like algorithms. In this paper, we study the string matching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.