Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version
Alexander Sch\"onhuth, Raheleh Salari, S. Cenk Sahinalp

TL;DR
This paper introduces a novel statistical method based on Pair Hidden Markov Models to evaluate the significance of gaps in sequence alignments, improving the reliability of indel detection especially in challenging alignment scenarios.
Contribution
It develops efficient algorithms for gap significance testing using HMMs, enhancing the evaluation of indels in affine gap penalty alignments compared to existing methods.
Findings
Indel reliability increases with gap significance.
Method performs well on structural alignments from SABmark.
Improves indel assessment in twilight zone alignments.
Abstract
Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Algorithms · Algorithms and Data Compression
