PepBenchmark: A Standardized Benchmark for Peptide Machine Learning

Jiahui Zhang; Rouyi Wang; Kuangqi Zhou; Tianshu Xiao; Lingyan Zhu; Yaosen Min; Yang Wang

arXiv:2604.10531·cs.LG·April 14, 2026

PepBenchmark: A Standardized Benchmark for Peptide Machine Learning

Jiahui Zhang, Rouyi Wang, Kuangqi Zhou, Tianshu Xiao, Lingyan Zhu, Yaosen Min, Yang Wang

PDF

1 Repo 2 Datasets 1 Video

TL;DR

PepBenchmark introduces a comprehensive, standardized benchmark suite for peptide drug discovery, unifying datasets, preprocessing, and evaluation to accelerate AI research in peptide therapeutics.

Contribution

It provides the first unified benchmark with curated datasets, preprocessing pipelines, and evaluation protocols for peptide ML models.

Findings

01

Most comprehensive AI-ready peptide dataset resource to date

02

Standardized preprocessing pipeline improves data quality and consistency

03

Unified evaluation protocol enables fair comparison of models

Abstract

Peptide therapeutics are widely regarded as the "third generation" of drugs, yet progress in peptide Machine Learning (ML) are hindered by the absence of standardized benchmarks. Here we present PepBenchmark, which unifies datasets, preprocessing, and evaluation protocols for peptide drug discovery. PepBenchmark comprises three components: (1) PepBenchData, a well-curated collection comprising 29 canonical-peptide and 6 non-canonical-peptide datasets across 7 groups, systematically covering key aspects of peptide drug development, representing, to the best of our knowledge, the most comprehensive AI-ready dataset resource to date; (2) PepBenchPipeline, a standardized preprocessing pipeline that ensures consistent dataset cleaning, construction, splitting, and feature transformation, mitigating quality issues common in ad hoc pipelines; and (3) PepBenchLeaderboard, a unified evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZGCI-AI4S-Pep/PepBenchmark
github

Datasets

Videos

PepBenchmark: A Standardized Benchmark for Peptide Machine Learning· slideslive