# PepScorer::RMSD: An Improved Machine Learning Scoring Function for Protein–Peptide Docking

**Authors:** Andrea Giuseppe Cavalli, Giulio Vistoli, Alessandro Pedretti, Laura Fumagalli, Angelica Mazzolari

PMC · DOI: 10.3390/ijms27020870 · 2026-01-15

## TL;DR

PepScorer::RMSD is a new machine learning tool that improves the accuracy of predicting how peptides bind to proteins, enhancing drug discovery for peptide-based drugs.

## Contribution

PepScorer::RMSD introduces a novel machine learning scoring function tailored for protein–peptide docking, outperforming existing methods in pose selection and docking power.

## Key findings

- PepScorer::RMSD achieved a Pearson correlation of 0.70 and a mean absolute error of 1.77 Å in RMSD prediction.
- The model demonstrated top-1 docking power values of 92% on the evaluation set and 81% on an external test set.
- The workflow was benchmarked against AlphaFold-Multimer predictions, confirming its robustness for virtual screening.

## Abstract

Over the past two decades, pharmaceutical peptides have emerged as a powerful alternative to traditional small molecules, offering high potency, specificity, and low toxicity. However, most computational drug discovery tools remain optimized for small molecules and need to be entirely adapted to peptide-based compounds. Molecular docking algorithms, commonly employed to rank drug candidates in early-stage drug discovery, often fail to accurately predict peptide binding poses due to their high conformational flexibility and scoring functions not being tailored to peptides. To address these limitations, we present PepScorer::RMSD, a novel machine learning-based scoring function specifically designed for pose selection and enhancement of docking power (DP) in virtual screening campaigns targeting peptide libraries. The model predicts the root-mean-squared deviation (RMSD) of a peptide pose relative to its native conformation using a curated dataset of protein–peptide complexes (3–10 amino acids). PepScorer::RMSD outperformed conventional, ML-based, and peptide-specific scoring functions, achieving a Pearson correlation of 0.70, a mean absolute error of 1.77 Å, and top-1 DP values of 92% on the evaluation set and 81% on an external test set. Our PLANTS-based workflow was benchmarked against AlphaFold-Multimer predictions, confirming its robustness for virtual screening. PepScorer::RMSD and the curated dataset are freely available in Zenodo

## Full-text entities

- **Diseases:** toxicity (MESH:D064420)
- **Chemicals:** amino acids (MESH:D000596)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12842220/full.md

---
Source: https://tomesphere.com/paper/PMC12842220