Developing a Machine-Learning Interatomic Potential for Non-Covalent Interactions in Proteins
Lejia Zeng (1, 2), Xintong Zhang (1, 2), Yuchan Pei (1, 2), Lifeng Zhao (2), Lan Hua (2), Jincai Yang (2), Niu Huang (1, 2) ((1) Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China, (2) National Institute of Biological Sciences

TL;DR
This paper introduces PANIP, a machine learning interatomic potential trained on protein non-covalent interactions, achieving quantum accuracy and outperforming existing models in predicting diverse molecular interactions.
Contribution
The paper presents PANIP, a novel ensemble MLIP trained with active learning on protein fragment interactions, offering high accuracy and transferability for modeling non-covalent interactions in proteins.
Findings
PANIP achieves <0.2 kcal/mol MAE on out-of-distribution systems.
PANIP outperforms ANI-2x, especially for charged dimers.
Enables QM-level protein-ligand binding energy estimation at near force-field costs.
Abstract
Machine learning interatomic potentials (MLIPs) enable efficient modeling of molecular interactions with quantum mechanical (QM) accuracy. However, constructing robust and representative training datasets that capture subtle, system-specific interaction motifs remains challenging. We introduce PANIP (PAirwise Non-covalent Interaction Potential), an ensemble MLIP model built upon the NequIP framework and trained on non-covalent interactions (NCIs) between protein-derived fragments. PANIP is trained using an automated multi-fidelity active learning (MFAL) workflow, in which a representative training subset, termed PDB-FRAGID (PDB Fragment Interaction Dataset), was distilled from an otherwise prohibitively large pool of fragment dimers extracted from the Protein Data Bank (PDB). PANIP retains B97X-D3BJ/def2-TZVPP-level accuracy and achieves mean absolute errors below 0.2 kcal/mol…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Protein Structure and Dynamics · Computational Drug Discovery Methods
