Suffix Tree of Alignment: An Efficient Index for Similar Data

Joong Chae Na; Heejin Park; Maxime Crochemore; Jan Holub; Costas S.; Iliopoulos; Laurent Mouchard; and Kunsoo Park

arXiv:1305.1744·cs.DS·May 9, 2013

Suffix Tree of Alignment: An Efficient Index for Similar Data

Joong Chae Na, Heejin Park, Maxime Crochemore, Jan Holub, Costas S., Iliopoulos, Laurent Mouchard, and Kunsoo Park

PDF

Open Access

TL;DR

This paper introduces a space and time-efficient suffix tree data structure for aligned similar strings, optimizing pattern search and construction by exploiting string similarities.

Contribution

It proposes a novel suffix tree of alignment that leverages string similarity to improve efficiency over traditional generalized suffix trees.

Findings

01

Suffix tree of alignment has fewer leaves for similar strings.

02

Pattern search remains efficient at O(|P|+occ) time.

03

Construction algorithms are optimized for different starting points.

Abstract

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings $A$ and $B$ is a compacted trie representing all suffixes in $A$ and $B$ . It has $∣ A ∣ + ∣ B ∣$ leaves and can be constructed in $O (∣ A ∣ + ∣ B ∣)$ time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of $A$ and $B$ . In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of $A$ and $B$ has $∣ A ∣ + l_{d} + l_{1}$ leaves where $l_{d}$ is the sum of the lengths of all parts of $B$ different from $A$ and $l_{1}$ is the sum of the lengths of some common parts of $A$ and $B$ . We did not compromise the pattern search to reduce the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Genomics and Phylogenetic Studies