Discovering Patterns in Biological Sequences by Optimal Segmentation
Joseph Bockhorst, Nebojsa Jojic

TL;DR
This paper presents a dynamic programming approach to optimally segment biological sequences, revealing correlated regions for improved analysis in vaccine design and SNP prediction, outperforming existing methods.
Contribution
It introduces a novel Bayesian network-based segmentation method with an efficient dynamic programming algorithm for biological sequence analysis.
Findings
Error rates for SNP prediction reduced by up to one-third
Method effectively identifies correlated segments in biological sequences
Outperforms state-of-the-art SNP prediction methods
Abstract
Computational methods for discovering patterns of local correlations in sequences are important in computational biology. Here we show how to determine the optimal partitioning of aligned sequences into non-overlapping segments such that positions in the same segment are strongly correlated while positions in different segments are not. Our approach involves discovering the hidden variables of a Bayesian network that interact with observed sequences so as to form a set of independent mixture models. We introduce a dynamic program to efficiently discover the optimal segmentation, or equivalently the optimal set of hidden variables. We evaluate our approach on two computational biology tasks. One task is related to the design of vaccines against polymorphic pathogens and the other task involves analysis of single nucleotide polymorphisms (SNPs) in human DNA. We show how common tasks in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
