TL;DR
This paper introduces a novel machine learning-based representation of local protein environments derived from atomistic foundation models, capturing structural and chemical features effectively, and enabling improved biomolecular modeling and NMR chemical shift prediction.
Contribution
The work presents a new AFM-derived embedding for local protein environments that captures structural and chemical features, and enables a physics-informed chemical shift predictor with state-of-the-art accuracy.
Findings
Effective embedding captures local structure and chemical features.
Representation space exhibits meaningful structure for biomolecular environments.
Enables a state-of-the-art chemical shift prediction in NMR.
Abstract
The local structure of a protein strongly impacts its function and interactions with other molecules. Therefore, a concise, informative representation of a local protein environment is essential for modeling and designing proteins and biomolecular interactions. However, these environments' extensive structural and chemical variability makes them challenging to model, and such representations remain under-explored. In this work, we propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs). We demonstrate that this embedding effectively captures both local structure (e.g., secondary motifs), and chemical features (e.g., amino-acid identity and protonation state). We further show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors over the…
Peer Reviews
Decision·ICLR 2026 Poster
- Repurposing latent features of MLFFs as canonical protein descriptors is a timely, well-motivated idea that links quantum-level atomistic modeling with biomolecular representation learning. - The paper covers a solid range of downstream tasks tied to experimental observables and provides thorough analysis of the physical plausibility of its predictions.
- Dataset and baselines: For the pKa and NMR shift tasks, the baselines are evaluated in conditions that differ from their intended use, whereas the MLFF-feature models introduced here are trained directly for the target objective. For instance, the pKa baselines are designed to predict experimental values, while the proposed methods are trained to reproduce a cheaper computational reference. This creates a benchmark mismatch, since the baselines are not optimized for the reference chosen in thi
1. Repurposing MLFFs as "foundation models" for structural biology is in nnovative and valuable. 2. The design of the validation experiments (e.g., ring current effect , helix unfolding ) is a commendable standard for physical realism. 3.The paper effectively demonstrates the general-purpose nature of the embeddings for zero-shot clustering and generative guidance.
1.The paper's central claims rest on comparisons against pKa-ANI and UCBShift2-X that appear to be factually incorrect or based on flawed implementations. 2.The $pK_{a}$ evaluation is missing the entire 2024/2025 SOTA, invalidating its performance claims. 3.The paper compares its structural embeddings against sequence (ESM) embeddings. It critically fails to benchmark against the most obvious and relevant competitors: other structural embeddings, namely those from the AlphaFold2 or ESMFold str
This paper rigorously evaluates multiple popular MLFF methods on across benchmarks and connect them to some protein structure related tasks. They perform an interesting analysis on the chemical shift prediction.
The evaluation benchmarks such as secondary structure, amino acid type prediction are a bit straightforward. While MLFF naturally learn the physics of local structural environments, there are other machine learning based approaches that reason over the local structure and are suitable to predict protonation state, secondary structure, and amino acid types. This work does not benchmark MLFF versus these methods on representing local structure. [1] Simulating 500 million years of evolution with
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
