Succinct Dictionary Matching With No Slowdown

Djamal Belazzougui

arXiv:1001.2860·cs.DS·May 18, 2015

Succinct Dictionary Matching With No Slowdown

Djamal Belazzougui

PDF

TL;DR

This paper introduces a space-efficient data structure for dictionary matching that matches the speed of classical solutions without the usual slowdown, significantly reducing space requirements.

Contribution

It presents a succinct representation of the Aho-Corasick automaton that maintains linear-time query performance while reducing space to near-optimal levels, including entropy-based compression.

Findings

01

Space usage is reduced to m(log sigma + O(1)) + O(d log(n/d)) bits.

02

Query time remains O(|T| + occ) despite space reduction.

03

Space can be further compressed to m(H0 + O(1)) + O(d log(n/d)) using entropy.

Abstract

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size sigma, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a data structure that occupies O(m log m) bits of space where m <= n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log sigma + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T| + occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses space O(n log sigma)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.