Efficient repeat finding via suffix arrays
Veronica Becher, Alejandro Deymonnaz, Pablo Ariel Heiber

TL;DR
This paper introduces an efficient method for identifying interspersed maximal repeats in large datasets using suffix arrays, offering a practical alternative to suffix trees with proven correctness and complexity.
Contribution
It presents a novel suffix array-based algorithm for repeat finding that outperforms suffix tree approaches, especially on very large inputs.
Findings
Improved efficiency over suffix tree methods
Suitable for very large datasets
Proven correctness and complexity
Abstract
We solve the problem of finding interspersed maximal repeats using a suffix array construction. As it is well known, all the functionality of suffix trees can be handled by suffix arrays, gaining practicality. Our solution improves the suffix tree based approaches for the repeat finding problem, being particularly well suited for very large inputs. We prove the corrrectness and complexity of the algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
