Knowledge Distillation of a Protein Language Model Yields a Foundational Implicit Solvent Model
Justin Airas, Bin Zhang

TL;DR
This paper introduces a novel implicit solvent model for proteins by distilling knowledge from a protein language model into a graph neural network, enabling accurate, transferable, and efficient protein simulations including folding and disordered states.
Contribution
The study presents a new data-driven implicit solvent model that combines protein language models with graph neural networks, overcoming traditional limitations of ISMs.
Findings
The GNN potential enables stable, long-timescale molecular dynamics simulations.
The hybrid model accurately reproduces protein folding free-energy landscapes.
The approach predicts structural ensembles of intrinsically disordered proteins.
Abstract
Implicit solvent models (ISMs) promise to deliver the accuracy of explicit solvent simulations at a fraction of the computational cost. However, despite decades of development, their accuracy has remained insufficient for many critical applications, particularly for simulating protein folding and the behavior of intrinsically disordered proteins. Developing a transferable, data-driven ISM that overcomes the limitations of traditional analytical formulas remains a central challenge in computational chemistry. Here we address this challenge by introducing a novel strategy that distills the evolutionary information learned by a protein language model, ESM3, into a computationally efficient graph neural network (GNN). We show that this GNN potential, trained on effective energies from ESM3, is robust enough to drive stable, long-timescale molecular dynamics simulations. When combined with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Protein Structure and Dynamics · Quantum many-body systems
