# Random Functions as Data Compressors for Machine Learning of Molecular Processes

**Authors:** Jayashrita Debnath, Gerhard Hummer

PMC · DOI: 10.1021/acs.jctc.5c01638 · 2026-01-29

## TL;DR

This paper shows that random nonlinear projections can compress data in molecular simulations without losing important information, speeding up machine learning analysis.

## Contribution

The novel contribution is using random nonlinear projections as efficient data compressors for molecular ML tasks.

## Key findings

- Random projections retain core static and dynamic information in high-dimensional molecular data.
- Compression improves trajectory analysis robustness for protein folding simulations.

## Abstract

Machine learning (ML) is rapidly transforming the way
molecular
dynamics simulations are performed and analyzed from materials modeling
to studies of protein folding and function. ML algorithms are often
employed to learn low-dimensional representations of conformational
landscapes and cluster trajectories into relevant metastable states.
Most of these algorithms require the selection of a small number of
features that describe the problem of interest. Although deep neural
networks can tackle large numbers of input features, the training
costs increase with input size, which makes the selection of a subset
of features mandatory for most problems of practical interest. Here,
we show that random nonlinear projections can be used to compress
large feature spaces and make computations faster without a substantial
loss of information. We describe an efficient way to produce random
projections and then exemplify the general procedure for protein folding.
For our test cases NTL9 and the double-norleucin variant of the villin
headpiece, we find that random compression retains the core static
and dynamic information of the original high-dimensional feature space,
making trajectory analysis more robust.

## Linked entities

- **Proteins:** ntl-9 (CCR4-NOT transcription complex subunit 9)

## Full-text entities

- **Diseases:** MD (MESH:D000092242)
- **Chemicals:** Alanine Dipeptide (-), norleucine (MESH:D009646), water (MESH:D014867)

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12895416/full.md

---
Source: https://tomesphere.com/paper/PMC12895416