SimpleFold: Folding Proteins is Simpler than You Think
Yuyang Wang, Jiarui Lu, Navdeep Jaitly, Josh Susskind, Miguel Angel Bautista

TL;DR
SimpleFold demonstrates that a straightforward transformer-based model, trained with a flow-matching objective, can achieve competitive protein folding performance without complex domain-specific modules.
Contribution
This work introduces SimpleFold, a novel protein folding model using only general transformer blocks and flow-matching training, challenging the necessity of complex domain-specific architectures.
Findings
Achieves competitive results on standard benchmarks
Performs well in ensemble prediction tasks
Efficient inference on consumer hardware
Abstract
Protein folding models have achieved groundbreaking results typically via a combination of integrating domain knowledge into the architectural blocks and training pipelines. Nonetheless, given the success of generative models across different but related problems, it is natural to question whether these architectural designs are a necessary condition to build performant models. In this paper, we introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer blocks. Protein folding models typically employ computationally expensive modules involving triangular updates, explicit pair representations or multiple training objectives curated for this specific domain. Instead, SimpleFold employs standard transformer blocks with adaptive layers and is trained via a generative flow-matching objective with an additional structural term. We…
Peer Reviews
Decision·ICLR 2026 Poster
The primary strength is **architectural simplification**. The work demonstrates that a general-purpose transformer can achieve comparable performance to more complex, bespoke architectures, provided it is leveraged at a large scale and conditioned on a powerful pretrained PLM. The clear empirical validation of scaling laws for both model and data size is a useful, albeit expected, finding.
**Limited Novelty**: The approach is highly derivative. It combines a standard transformer with a known flow-matching objective and, most importantly, relies heavily on the powerful ESM2-3B PLM. These attempts have already appeared in previous work and are not surprising; they cannot be called new folding methods. **Insignificant Performance**: In the context of the rapidly advancing protein design field, the results are not a significant breakthrough. The model achieves "competitive" or "compa
• SimpleFold effectively removes complex components commonly used in previous protein folding models—such as MSA processing, pairwise representations, and triangle modules—leading to significant computational speedup while maintaining competitive prediction accuracy. • The model shows promising results in ensemble generation, highlighting its potential for capturing conformational diversity. • The paper is clearly structured and easy to follow.
• The scope of SimpleFold remains limited to single-chain protein structure prediction. Given the emergence of efficient and generalizable predictors such as ESMFold and Protenix-Mini [1]—which are also capable of predicting biomolecular complexes—SimpleFold does not demonstrate a clear advantage in either efficiency or accuracy, raising questions about its practical added value. • The overall framework constitutes a relatively straightforward integration of existing techniques from computer vi
- The core argument—that domain-specific inductive biases might be replaceable by scale and general architectures—is highly stimulating and challenges conventional wisdom in protein folding. If validated further, this could significantly simplify model design in structural biology and beyond. - SimpleFold's reliance solely on standard Diffusion Transformer blocks makes the architecture remarkably simple compared to those with attention bias in AF3. This generality facilitates leveraging advances
- While competitive, SimpleFold-3B does not consistently surpass the very best models like AlphaFold2, especially on the challenging CASP14 benchmark. This suggests that current domain-specific designs and/or MSA information still provide an edge, particularly for difficult targets. - SimpleFold 1.6B and 3B models achieve accuracy comparable to (or on par with) ESMFold. However, their inference speed is noted to be slower. This presents a critical trade-off, and the justification for using these
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Materials and Mechanics
