On building minimal automaton for subset matching queries

Kimmo Fredriksson

arXiv:1004.0902·cs.FL·October 4, 2010

On building minimal automaton for subset matching queries

Kimmo Fredriksson

PDF

Open Access

TL;DR

This paper introduces a method for constructing minimal automata to efficiently answer subset matching queries on sets of strings with subset-labeled positions, with applications in biology and music retrieval.

Contribution

It presents a novel indexing technique for subset matching queries, achieving sub-quadratic average construction time based on alphabet and subset sizes.

Findings

01

Index construction in O(n^{log_{σ/Δ}(σ)} log n) average time

02

Efficient subset matching query answering

03

Applications in computational biology and music information retrieval

Abstract

We address the problem of building an index for a set $D$ of $n$ strings, where each string location is a subset of some finite integer alphabet of size $σ$ , so that we can answer efficiently if a given simple query string (where each string location is a single symbol) $p$ occurs in the set. That is, we need to efficiently find a string $d \in D$ such that $p [i] \in d [i]$ for every $i$ . We show how to build such index in $O (n^{l o g_{σ /Δ} (σ)} lo g (n))$ average time, where $Δ$ is the average size of the subsets. Our methods have applications e.g.\ in computational biology (haplotype inference) and music information retrieval.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Network Packet Processing and Optimization