AlphaFold distillation for inverse protein design
Igor Melnyk, Aurélie Lozano, Payel Das, Vijil Chenthamarakshan

TL;DR
This paper introduces a faster method for designing protein sequences that fold into specific 3D structures by using knowledge distillation from AlphaFold.
Contribution
The novelty is using knowledge distillation on AlphaFold confidence metrics to create a fast, differentiable model for inverse protein design.
Findings
The method improves sequence recovery by up to 3% compared to non-regularized baselines.
It increases protein diversity by up to 45% while maintaining structural consistency.
The approach is versatile and applicable to other design tasks like protein infilling.
Abstract
Inverse protein folding, the process of designing sequences that fold into a specific 3D structure, is crucial in bio-engineering and drug discovery. Traditional methods rely on experimentally resolved structures, but these cover only a small fraction of protein sequences. Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences. However, these models are too slow for integration into the optimization loop of inverse folding models during training. To address this, we propose using knowledge distillation on folding model confidence metrics, such as pTM or pLDDT scores, to create a faster and end-to-end differentiable distilled model. This model can then be used as a structure consistency regularizer in training the inverse folding model. Our technique is versatile and can be applied to other design tasks, such as sequence-based…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · RNA and protein synthesis mechanisms · Machine Learning in Bioinformatics
