Proteina: Scaling Flow-based Protein Structure Generative Models
Tomas Geffner, Kieran Didi, Zuobai Zhang, Danny Reidenbach, Zhonglin, Cao, Jason Yim, Mario Geiger, Christian Dallago, Emine Kucukbenli, Arash, Vahdat, Karsten Kreis

TL;DR
Proteina introduces a scalable flow-based model for protein backbone generation, leveraging hierarchical conditioning and advanced training techniques to produce diverse, long, and designable proteins with state-of-the-art performance.
Contribution
The paper presents Proteina, a large-scale flow-based protein generator with hierarchical conditioning, novel training strategies, and new metrics for evaluating protein structure generation.
Findings
Achieves state-of-the-art performance in de novo protein backbone design.
Generates diverse proteins up to 800 residues long.
Provides high-level control over secondary structures and fold-specific features.
Abstract
Recently, diffusion- and flow-based generative models of protein structures have emerged as a powerful tool for de novo protein design. Here, we develop Proteina, a new large-scale flow-based protein backbone generator that utilizes hierarchical fold class labels for conditioning and relies on a tailored scalable transformer architecture with up to 5x as many parameters as previous models. To meaningfully quantify performance, we introduce a new set of metrics that directly measure the distributional similarity of generated proteins with reference sets, complementing existing metrics. We further explore scaling training data to millions of synthetic protein structures and explore improved training and sampling recipes adapted to protein backbone generation. This includes fine-tuning strategies like LoRA for protein backbones, new guidance methods like classifier-free guidance and…
Peer Reviews
Decision·ICLR 2025 Oral
1. The paper introduces novel metrics that address previously omitted distribution-level aspects of protein generation, which is both valuable and innovative, allowing for a more comprehensive evaluation of model performance. Additionally, the scaling of both training data and model aligns with the evolution of the field of protein generation. 2. The paper proposes an innovative $t$ sampling method that effectively captures the unique characteristics of protein data. This is also the first appli
In line 119, a partial derivative seems mistakenly written as a total derivative, and the divergence is incorrectly labeled as a gradient. I believe the right form of the continuity equation should be like $\partial p_t(\boldsymbol x_t)/\partial t=-\nabla_{\boldsymbol x_t}\cdot(p_t(\boldsymbol x_t)\boldsymbol u_t(\boldsymbol x_t))$. Additionally, the differential symbol should be formatted in upright type, as $\mathrm{d}$, to follow standard conventions.
1. The paper is certainly well written and I do enjoy the reading. 2. The paper makes several very interesting yet important explorations and observations. For example, though AF3 already observes the Equivariant vs Non-equivariant properties, it would be nice to further explore the scalability with non-equivariant transformers; The auto guidance parts of generation also provides some new insights into the protein structure generation; Studying protein structure generation in scale is also an
1. Though with a scaled structure, it would be better to understand the training in a more systematic way, e.g. scaling laws. The trained flow matching model in general could still obtain the corresponding likelihood generally, could Proteina also conduct a likelihood evaluation over the protein structures? Is it possible to study the scaling laws based on that? 2. The notation of Table 1 for models with different configs is not very clear which makes it hard to read and analyze. I also sugge
1. Very well-written paper and very easy to follow. 2. The authors show that large-scale non-equivariant flow models also succeed on unconditional protein structure generation.
1. The authors claim to significantly outperform all previous works; however, evidence supporting this assertion is not found in the experimental results table. Excluding unconditional models, there are no direct competitors, and comparisons can only be made with unconditional results. Even if the bold results are accepted as outperforming based on FPSD, FS, fJSD, and TM-score metrics, this model exhibits the lowest diversity. 2. RFdiffusion, ESM3, and Genie 2 were trained on different datasets,
Code & Models
Videos
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research
MethodsSparse Evolutionary Training
