# Beyond similarity assessment: Selecting the optimal model for sequence   alignment via the Factorized Asymptotic Bayesian algorithm

**Authors:** Taikai Takeda, Michiaki Hamada

arXiv: 1705.06911 · 2017-10-17

## TL;DR

This paper introduces a new model selection method for Pair Hidden Markov Models in sequence alignment, improving accuracy by choosing optimal hidden state configurations using the Factorized Information Criteria.

## Contribution

The paper presents a novel model selection approach for PHMMs that effectively determines the number of hidden states, enhancing alignment accuracy in bioinformatics.

## Key findings

- The method accurately selects models with higher posterior probability.
- Simulation results show improved alignment accuracy.
- Application to DNA datasets demonstrates the selection of more complex models.

## Abstract

Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy.We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criteria (FIC), which is widely utilised in model selection for probabilistic models with hidden variables. Our simulations indicated this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.06911/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1705.06911/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1705.06911/full.md

---
Source: https://tomesphere.com/paper/1705.06911