A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions
Akira R. Kinjo

TL;DR
This paper introduces a comprehensive statistical model for protein multiple sequence alignments that captures both local and distant residue correlations, as well as insertions, to better understand protein structure and conservation patterns.
Contribution
It develops a lattice gas model based on maximum entropy principles that integrates short-range and long-range interactions, advancing the modeling of protein MSAs.
Findings
Long-range interactions enhance conservation pattern specificity.
Model captures both insertions and correlations in MSAs.
Analysis shows increased stability of conserved residues.
Abstract
The multiple sequence alignment (MSA) of a protein family provides a wealth of information in terms of the conservation pattern of amino acid residues not only at each alignment site but also between distant sites. In order to statistically model the MSA incorporating both short-range and long-range correlations as well as insertions, I have derived a lattice gas model of the MSA based on the principle of maximum entropy. The partition function, obtained by the transfer matrix method with a mean-field approximation, accounts for all possible alignments with all possible sequences. The model parameters for short-range and long-range interactions were determined by a self-consistent condition and by a Gaussian approximation, respectively. Using this model with and without long-range interactions, I analyzed the globin and V-set domains by increasing the "temperature" and by "mutating" a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
