Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
Kieran Didi, Zuobai Zhang, Guoqing Zhou, Danny Reidenbach, Zhonglin Cao, Sooyoung Cha, Tomas Geffner, Christian Dallago, Jian Tang, Michael M. Bronstein, Martin Steinegger, Emine Kucukbenli, Arash Vahdat, Karsten Kreis

TL;DR
Proteina-Complexa introduces a unified atomistic protein binder generation method that combines generative modeling and test-time optimization, achieving state-of-the-art results in computational binder design.
Contribution
A novel unified framework that integrates generative pretraining and test-time optimization for atomistic protein binder design, surpassing existing methods.
Findings
Higher in-silico success rates than previous approaches
Test-time optimization strategies outperform hallucination methods
Effective interface hydrogen bond and fold class-guided binder generation
Abstract
Protein interaction modeling is central to protein design, which has been transformed by machine learning with applications in drug discovery and beyond. In this landscape, structure-based de novo binder design is cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Proteina-Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architectures and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
