Efficient repeat finding via suffix arrays

Veronica Becher; Alejandro Deymonnaz; Pablo Ariel Heiber

arXiv:1304.0528·cs.DS·April 3, 2013·1 cites

Efficient repeat finding via suffix arrays

Veronica Becher, Alejandro Deymonnaz, Pablo Ariel Heiber

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient method for identifying interspersed maximal repeats in large datasets using suffix arrays, offering a practical alternative to suffix trees with proven correctness and complexity.

Contribution

It presents a novel suffix array-based algorithm for repeat finding that outperforms suffix tree approaches, especially on very large inputs.

Findings

01

Improved efficiency over suffix tree methods

02

Suitable for very large datasets

03

Proven correctness and complexity

Abstract

We solve the problem of finding interspersed maximal repeats using a suffix array construction. As it is well known, all the functionality of suffix trees can be handled by suffix arrays, gaining practicality. Our solution improves the suffix tree based approaches for the repeat finding problem, being particularly well suited for very large inputs. We prove the corrrectness and complexity of the algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

foshardware/lsc
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms