A linear-time algorithm for finding the longest segment which scores above a given threshold
Mikl\'os Cs\H{u}r\"os

TL;DR
This paper introduces a linear-time algorithm to identify the longest sequence segment with a sum or average above a specified threshold, applicable in DNA analysis and genome assembly preprocessing.
Contribution
It presents a novel linear-time algorithm for finding the longest segment with sum or average constraints, improving efficiency over previous methods.
Findings
Algorithm runs in linear time
Applicable to DNA sequence analysis
Useful for genome assembly preprocessing
Abstract
This paper describes a linear-time algorithm that finds the longest stretch in a sequence of real numbers (``scores'') in which the sum exceeds an input parameter. The algorithm also solves the problem of finding the longest interval in which the average of the scores is above a fixed threshold. The problem originates from molecular sequence analysis: for instance, the algorithm can be employed to identify long GC-rich regions in DNA sequences. The algorithm can also be used to trim low-quality ends of shotgun sequences in a preprocessing step of whole-genome assembly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Metaheuristic Optimization Algorithms Research · Optimization and Search Problems
