# Dynamic protein structures in solution: decoding the amide I band with 2D-IR spectral libraries and machine learning

**Authors:** Amy L. Farmer, Kelly Brown, Sophie E. T. Kendall-Price, Partha Malakar, Gregory M. Greetham, Neil T. Hunt

PMC · DOI: 10.1039/d5sc09973k · Chemical Science · 2026-01-08

## TL;DR

This paper introduces a method using 2D-IR spectroscopy and machine learning to quickly determine protein structures in solution without labels.

## Contribution

The novel approach combines 2D-IR spectral libraries with machine learning for rapid, quantitative analysis of protein structures in solution.

## Key findings

- SVM models classified protein structures with an RMS error of ≤7% for α-helix and β-sheet content.
- The method can predict the number and length of helices and identify parallel and antiparallel β-sheets.
- The approach enables rapid analysis of dynamic protein structures under physiological conditions.

## Abstract

The dynamic three-dimensional structures of proteins dictate their function, but accessing structures in solution at physiological temperatures is challenging. Ultrafast 2D-IR spectroscopy of the protein amide I band produces a spectral fingerprint that derives directly from the 3D backbone structure within minutes, using microlitres of label-free samples, in aqueous (H2O) solution and with picosecond time resolution. However, transforming 2D-IR fingerprints into quantitative, solution-phase protein structures relies on decoding the fundamental link between the atomistic structure and the 2D spectrum. We demonstrate a top-down approach to solution-phase protein structure determination that combines 2D-IR spectral libraries with machine learning (ML). Using a dataset consisting of 6732 spectra of 35 proteins in H2O that span a range of structures, Support-Vector Machine (SVM) models classified unknown protein samples according to structural content and measured quantities of α-helix and β-sheet with an RMS error of ≤7%. The potential for hybrid 2D-IR-ML tools to predict the number and length of helices in a protein, and identify the presence of parallel and antiparallel β-sheets from the 2D-IR fingerprint is also demonstrated. These results lay the groundwork for rapid, quantitative analysis of dynamic protein structures under physiologically relevant conditions.

Ultrafast 2D-IR spectroscopy and machine learning combine to determine label-free protein secondary structures in solution.

## Full-text entities

- **Chemicals:** H2O (MESH:D014867), amide (MESH:D000577)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12794349/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12794349/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC12794349/full.md

---
Source: https://tomesphere.com/paper/PMC12794349