Pattern Matching and Consensus Problems on Weighted Sequences and Profiles
Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski

TL;DR
This paper develops efficient algorithms for pattern matching on uncertain biological sequences, specifically weighted sequences and profiles, including a novel consensus problem, with proven optimality bounds based on knapsack problem techniques.
Contribution
It introduces new algorithms for pattern matching on uncertain sequences and a consensus problem, utilizing a meet-in-the-middle approach and establishing optimality bounds.
Findings
Efficient algorithms for simple pattern matching on weighted sequences and profiles.
A novel consensus problem algorithm parameterized by the number of matching strings.
Proven optimality bounds for the algorithms based on knapsack problem complexity.
Abstract
We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
