Maximum entropy models for antibody diversity
Thierry Mora, Aleksandra Walczak, William Bialek, Curtis G. Callan, Jr

TL;DR
This study uses maximum entropy models based on pairwise correlations to accurately describe antibody sequence diversity in zebrafish, revealing collective properties and constraints that surpass independent substitution models.
Contribution
It introduces a maximum entropy modeling approach that captures higher-order correlations in antibody sequences, providing new insights into immune repertoire diversity.
Findings
Sequences follow Zipf's law
Repertoire decomposes into clusters
Diversity is heavily restricted by correlations
Abstract
Recognition of pathogens relies on families of proteins showing great diversity. Here we construct maximum entropy models of the sequence repertoire, building on recent experiments that provide a nearly exhaustive sampling of the IgM sequences in zebrafish. These models are based solely on pairwise correlations between residue positions, but correctly capture the higher order statistical properties of the repertoire. Exploiting the interpretation of these models as statistical physics problems, we make several predictions for the collective properties of the sequence ensemble: the distribution of sequences obeys Zipf's law, the repertoire decomposes into several clusters, and there is a massive restriction of diversity due to the correlations. These predictions are completely inconsistent with models in which amino acid substitutions are made independently at each site, and are in good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
