The effectiveness of position- and composition-specific gap costs for protein similarity searches
Aleksandar Stojmirovi\'c, E. Michael Gertz, Stephen F. Altschul and, Yi-Kuo Yu

TL;DR
This study compares position- and composition-specific gap costs in protein similarity searches, finding position-specific costs improve accuracy and that PSSMs can match HMMs' performance with less variance, while composition-specific costs have no effect.
Contribution
It quantifies the impact of position- and composition-specific gap scores and compares PSSMs to HMMs, revealing the advantages of position-specific gap penalties.
Findings
Position-specific gap penalties outperform uniform gap costs.
PSSMs from iterative construction match HMMs' accuracy with less variance.
Composition-specific gap costs do not affect retrieval performance.
Abstract
The flexibility in gap cost enjoyed by Hidden Markov Models (HMMs) is expected to afford them better retrieval accuracy than position-specific scoring matrices (PSSMs). We attempt to quantify the effect of more general gap parameters by separately examining the influence of position- and composition-specific gap scores, as well as by comparing the retrieval accuracy of the PSSMs constructed using an iterative procedure to that of the HMMs provided by Pfam and SUPERFAMILY, curated ensembles of multiple alignments. We found that position-specific gap penalties have an advantage over uniform gap costs. We did not explore optimizing distinct uniform gap costs for each query. For Pfam, PSSMs iteratively constructed from seeds based on HMM consensus sequences perform equivalently to HMMs that were adjusted to have constant gap transition probabilities, albeit with much greater variance. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
