On the entropy of protein families
John Barton (DCE-MIT,MIT), Arup Chakraborty (MIT,MIT,DCE-MIT), Simona, Cocco (LPS), Hugo Jacquin (LPS), R\'emi Monasson (LPTENS)

TL;DR
This paper estimates the entropy of protein families using various statistical models, explores the entropic cost of constraints, and relates these concepts to viral escape and protein evolution.
Contribution
It introduces new methods for estimating protein family entropy and quantifies the entropic cost of constraints, linking these to biological phenomena.
Findings
Entropy varies across protein families and models.
Constraints reduce entropy, affecting protein evolution.
Entropy estimates relate to viral escape probabilities.
Abstract
Proteins are essential components of living systems, capable of performing a huge variety of tasks at the molecular level, such as recognition, signalling, copy, transport, ... The protein sequences realizing a given function may largely vary across organisms, giving rise to a protein family. Here, we estimate the entropy of those families based on different approaches, including Hidden Markov Models used for protein databases and inferred statistical models reproducing the low-order (1-and 2-point) statistics of multi-sequence alignments. We also compute the entropic cost, that is, the loss in entropy resulting from a constraint acting on the protein, such as the fixation of one particular amino-acid on a specific site, and relate this notion to the escape probability of the HIV virus. The case of lattice proteins, for which the entropy can be computed exactly, allows us to provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
