# Quantified Dynamics-Property Relationships: Data-Efficient Protein Engineering with Machine Learning of Protein Dynamics

**Authors:** T. Emme Burgin

PMC · DOI: 10.1021/acs.jcim.5c01813 · Journal of Chemical Information and Modeling · 2025-10-22

## TL;DR

This paper introduces a machine learning method that uses protein dynamics simulations to guide protein engineering with minimal experimental data.

## Contribution

A novel framework that uses quantified dynamics-property relationships to optimize protein variants with limited experimental data.

## Key findings

- The method outperforms traditional machine learning approaches in optimizing protein variants with limited data.
- It can accurately predict key residues influencing a property using only a few labeled examples.
- The framework integrates molecular dynamics simulations with experimental data for protein engineering.

## Abstract

Machine learning
has proven to be very powerful for predicting
mutation effects in proteins, but the simplest approaches require
a substantial amount of training data. Because experiments to collect
training data are often expensive, time-consuming, and/or otherwise
limited, alternatives that make good use of small amounts of data
to guide protein engineering are of high potential value. One potential
alternative to large-scale benchtop experiments for collecting training
data is high-throughput molecular dynamics simulation; however, to
date, this source of data has been largely absent from the literature.
Here, I introduce a new method for selecting desirable protein variants
based on quantified relationships between a small number of experimentally
determined labels and descriptors of their dynamic properties. These
descriptors are provided by deep neural networks trained on data from
molecular dynamics simulations of variants of the protein of interest.
I demonstrate that this approach can obtain very highly optimized
variants based on small amounts of experimental data, outperforming
alternative supervised approaches to machine learning-guided directed
evolution with the same amount of experimental data. Furthermore,
I show that quantified dynamics-property relationships based on only
a handful of experimentally labeled example sequences can be used
to accurately predict the key residues that are most relevant to determining
the property in question, even when that information could not have
been known or predicted based on either the molecular dynamics simulations
or the experimental data alone. This work establishes a new and practical
framework for incorporating general protein dynamics information from
simulations of mutants to guide protein engineering.

## Full-text entities

- **Genes:** GABBR1 (gamma-aminobutyric acid type B receptor subunit 1) [NCBI Gene 2550] {aka GABABR1, GABBR1-3, GB1, GPRC3A, NEDLC}
- **Chemicals:** AvGFP (-), Glu (MESH:D018698), H (MESH:D006859), water (MESH:D014867), carbons (MESH:D002244), Lys (MESH:D008239)
- **Species:** Homo sapiens (human, species) [taxon 9606], Streptococcus (genus) [taxon 1301]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12606628/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12606628/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12606628/full.md

---
Source: https://tomesphere.com/paper/PMC12606628