Expanding functional protein sequence space using high entropy generative models

Roberto Netti; Emily Hinds; Francesco Calvanese; Rama Ranganathan; Martin Weigt; Francesco Zamponi

arXiv:2605.03578·q-bio.QM·May 6, 2026

Expanding functional protein sequence space using high entropy generative models

Roberto Netti, Emily Hinds, Francesco Calvanese, Rama Ranganathan, Martin Weigt, Francesco Zamponi

PDF

TL;DR

This study compares different Boltzmann Machine models for designing artificial proteins, showing high-entropy models generate larger, more neutral sequence spaces and better capture evolutionary landscapes.

Contribution

It demonstrates that high-entropy Boltzmann Machines outperform low-entropy models in representing protein fitness landscapes and generating functional artificial sequences.

Findings

01

High-entropy models sample a sequence space over fifteen orders of magnitude larger.

02

All models tested can produce functional enzymes with high success rates.

03

High-entropy models better capture local neutral spaces and reduce overfitting.

Abstract

Boltzmann Machines trained on evolutionary sequence data have emerged as a powerful paradigm for the data-driven design of artificial proteins. However, the relationship between model architecture, specifically parameter density, and experimental performance remains poorly understood. Here, we investigate this relationship using the Chorismate Mutase enzyme family as a model system. We compare standard fully connected Boltzmann Machines for Direct Coupling Analysis (bmDCA) with sparse models generated via progressive edge activation (eaDCA) and edge decimation (edDCA). We identify a maximum-entropy model (meDCA) along the decimation trajectory that represents an optimal balance between constraint satisfaction and the flexibility of the probability distribution. We synthesized and tested artificial sequences from all models using an in vivo complementation assay, finding that all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.