# Learning data-efficient coarse-grained molecular dynamics from forces and noise

**Authors:** Aleksander E. P. Durumeric, Yaoyi Chen, Aldo S. Pasos-Trejo, Frank Noé, Cecilia Clementi

PMC · DOI: 10.1038/s41467-026-70818-0 · Nature Communications · 2026-03-15

## TL;DR

This paper introduces a new method to train efficient molecular simulations using less data by combining machine learning with denoising techniques.

## Contribution

The novel approach integrates denoising objectives with force-matching to reduce data needs for training coarse-grained models.

## Key findings

- The method reduces atomistic data requirements by up to two orders of magnitude.
- It enables stable and physically consistent force field training for diverse protein systems.
- The framework bridges molecular dynamics with generative learning for broader applicability.

## Abstract

Molecular dynamics (MD) simulations are essential for elucidating biomolecular function, yet the computational cost of all-atom models often limits their reach. Machine-learned coarse-grained (MLCG) models offer a solution by simplifying the representation while maintaining near-atomistic accuracy. However, the training of MLCG models currently requires vast amounts of force-labeled sample conformations from reference atomistic MD. Here, we overcome this limitation by unifying the training of MLCG models with the principles of generative diffusion models. We demonstrate that accurate high-dimensional distributions of molecular ensembles can be recovered by integrating traditional force-matching with denoising objectives. This framework enables the construction of physically consistent and stable force fields while reducing atomistic data requirements by up to two orders of magnitude. Validated across diverse protein folds and scales, our work establishes a bridge between molecular dynamics simulation and modern generative learning, substantially lowering the computational cost of constructing accurate MLCG models and broadening their applicability to large biomolecular systems.

Machine learning coarse-grained models are a tool for efficient simulation of biomolecular systems but need large amounts of data to train. Here, the authors present a training scheme integrating denoising objectives for stable force field training with less data requirements.

## Full-text entities

- **Genes:** DDX53 (DEAD-box helicase 53) [NCBI Gene 168400] {aka CAGE, CT26}, SPOCK1 (SPARC (osteonectin), cwcv and kazal like domains proteoglycan 1) [NCBI Gene 6695] {aka SPOCK, TESTICAN, TIC1}
- **Diseases:** CG (MESH:D014202)
- **Chemicals:** Trp (MESH:D014364), Cl- (MESH:D002713), carbon (MESH:D002244), T (MESH:D014316), MLCG (-), water (MESH:D014867), amino acid (MESH:D000596), salt (MESH:D012492), Na+ (MESH:D012964)
- **Mutations:** K12M, 355K by D
- **Cell lines:** NTL9 — Homo sapiens (Human), Induced pluripotent stem cell (CVCL_RG56)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12992897/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12992897/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12992897/full.md

---
Source: https://tomesphere.com/paper/PMC12992897