Locating regions in a sequence under density constraints

Benjamin A. Burton; Mathias Hiron

arXiv:1104.0919·cs.DS·August 15, 2013

Locating regions in a sequence under density constraints

Benjamin A. Burton, Mathias Hiron

PDF

TL;DR

This paper introduces efficient algorithms for locating sequence regions with density constraints, significantly improving speed and memory usage over previous methods, enabling analysis of much longer biological sequences.

Contribution

The authors develop the first linear-time algorithm for finding the longest such substring and faster algorithms for related problems, surpassing prior O(n log n) solutions.

Findings

01

Algorithms run in O(n) and O(n log log n) time, faster than previous methods.

02

Practical tests show reduced memory use and ability to process longer sequences.

03

New algorithms outperform existing solutions in speed and efficiency.

Abstract

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.