# clusttraj: A Solvent-Informed Clustering Tool for Molecular Modeling

**Authors:** Rafael Bicudo Ribeiro, Henrique Musseli Cezar

PMC · DOI: 10.1021/acs.jctc.5c00634 · 2025-07-03

## TL;DR

clusttraj is a new clustering tool that improves molecular modeling by optimizing RMSD calculations in solute-solvent systems.

## Contribution

clusttraj introduces a solvent-informed clustering method that optimizes RMSD by pairing molecular configurations effectively.

## Key findings

- clusttraj reduces inflated RMSD values by finding optimal pairings between configurations.
- The tool is effective for solute-solvent systems, including water clusters and solvated proteins.
- Evaluation metrics help determine ideal clustering thresholds and linkage schemes automatically.

## Abstract

Clustering techniques are consolidated as a powerful
strategy for
analyzing the extensive data generated from molecular modeling. In
particular, some tools have been developed to cluster configurations
from classical simulations with a standard focus on individual units,
ranging from small molecules to complex proteins. Since the standard
approach includes computing the root mean square deviation (RMSD)
of atomic positions, accounting for the permutation between atoms
is crucial for optimizing the clustering procedure in the presence
of identical molecules. To address this issue, we present the clusttraj
program, a solvent-informed clustering package that fixes inflated
RMSD values by finding the optimal pairing between configurations.
The program combines reordering schemes with the Kabsch algorithm
to minimize the RMSD of molecular configurations before running a
hierarchical clustering protocol. By considering evaluation metrics,
one can determine the ideal threshold in an automated fashion and
compare the different linkage schemes available. The program capabilities
are exemplified by considering solute–solvent systems ranging
from pure water clusters to a solvated protein or a small solute in
different solvents. As a result, we investigate the dependence on
different parameters, such as the system size and reordering method,
and also the representativeness of the cluster medoids for the characterization
of optical properties. clusttraj is implemented as a Python library
and can be employed to cluster generic ensembles of molecular configurations
that go beyond solute–solvent systems.

## Full-text entities

- **Chemicals:** water (MESH:D014867)

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12288011/full.md

---
Source: https://tomesphere.com/paper/PMC12288011