Aligning biological sequences by exploiting residue conservation and coevolution
Anna Paola Muntoni, Andrea Pagnani, Martin Weigt, Francesco Zamponi

TL;DR
This paper introduces DCAlign, an efficient sequence alignment algorithm that incorporates coevolution signals, improving alignment accuracy for proteins and RNA without relying on structural data.
Contribution
The paper presents DCAlign, a novel alignment method that models coevolution among sequence positions using an approximate message-passing approach, surpassing traditional profile models.
Findings
DCAlign outperforms profile-based methods in simulated data.
It effectively aligns real protein and RNA sequences.
The method captures coevolution signals without structural information.
Abstract
Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e. arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position-specificities like conservation in sequences, but assume an independent evolution of different positions. Over the last years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles; and they are now widely used in predicting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
