# Enhanced Representation-Based Sampling for the Efficient Generation of Data Sets for Machine-Learned Interatomic Potentials

**Authors:** Moritz R. Schäfer, Johannes Kästner

PMC · DOI: 10.1021/acs.jctc.5c01767 · Journal of Chemical Theory and Computation · 2026-02-02

## TL;DR

This paper introduces ERBS, a new method to efficiently generate diverse training data for machine-learned interatomic potentials, improving model accuracy with less data.

## Contribution

ERBS combines dimensionality reduction and bias potentials to automatically sample diverse atomic configurations for training machine-learned potentials.

## Key findings

- ERBS reconstructs accurate free energy surfaces using short biased trajectories.
- ERBS-generated data for liquid water matches reference models with fewer data points.
- ERBS outperforms uncertainty-driven dynamics in exploring configurational space.

## Abstract

In this work, we
present enhanced representation-based sampling
(ERBS), a novel enhanced sampling method designed to generate structurally
diverse training data sets for machine-learned interatomic potentials.
ERBS automatically identifies collective variables by dimensionality
reduction of atomic descriptors and applies a bias potential inspired
by the On-the-Fly probability enhanced sampling framework. We highlight
the ability of Gaussian moment descriptors to capture collective molecular
motions and explore the impact of biasing parameters using alanine
dipeptide as a benchmark system. We show that free energy surfaces
can be reconstructed with high fidelity using only short biased trajectories
as training data. Further, we apply the method to the iterative construction
of a liquid water data set and compare the quality of simulated self-diffusion
coefficients for models trained with molecular dynamics and ERBS data.
Further, we active-learn models for liquid water with and without
enhanced sampling and compare the quality of simulated self-diffusion
coefficients. The self-diffusion coefficients closely match those
simulated with a reference model at a significantly reduced data set
size. Finally, we compare the sampling behavior of enhanced sampling
methods by benchmarking the mean squared displacements of BMIM+BF4
– trajectories simulated with uncertainty-driven dynamics and ERBS
and find that the latter significantly increases the exploration of
configurational space.

## Full-text entities

- **Chemicals:** water (MESH:D014867), alanine dipeptide (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12937107/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12937107/full.md

## References

85 references — full list in the complete paper: https://tomesphere.com/paper/PMC12937107/full.md

---
Source: https://tomesphere.com/paper/PMC12937107