An index for regular expression queries: Design and implementation

Dominic Tsang; Sanjay Chawla

arXiv:1108.1228·cs.DB·August 16, 2011·2 cites

An index for regular expression queries: Design and implementation

Dominic Tsang, Sanjay Chawla

PDF

Open Access

TL;DR

This paper introduces a new indexing method for regular expression queries in databases, formulating it as an optimization problem and providing algorithms with proven guarantees, significantly improving query performance.

Contribution

It presents a novel, robust approach to index regular expression queries by generating multigrams through optimization, supported by algorithms with theoretical guarantees.

Findings

01

Accurate and efficient indexing demonstrated on synthetic datasets

02

Effective indexing for complex PROSITE protein patterns

03

First practical indexing mechanism for regular expression queries

Abstract

The like regular expression predicate has been part of the SQL standard since at least 1989. However, despite its popularity and wide usage, database vendors provide only limited indexing support for regular expression queries which almost always require a full table scan. In this paper we propose a rigorous and robust approach for providing indexing support for regular expression queries. Our approach consists of formulating the indexing problem as a combinatorial optimization problem. We begin with a database, abstracted as a collection of strings. From this data set we generate a query workload. The input to the optimization problem is the database and the workload. The output is a set of multigrams (substrings) which can be used as keys to records which satisfy the query workload. The multigrams can then be integrated with the data structure (like B+ trees) to provide indexing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · Network Packet Processing and Optimization