PSBench: a large-scale benchmark for estimating the accuracy of protein complex structural models

Pawan Neupane; Jian Liu; Jianlin Cheng

arXiv:2505.22674·q-bio.BM·May 30, 2025

PSBench: a large-scale benchmark for estimating the accuracy of protein complex structural models

Pawan Neupane, Jian Liu, Jianlin Cheng

PDF

1 Repo 1 Video

TL;DR

PSBench is a comprehensive benchmark dataset with over one million protein complex models, designed to improve the estimation of model accuracy in protein structure prediction, facilitating better model ranking and selection.

Contribution

We created PSBench, a large-scale, well-annotated dataset for training and evaluating EMA methods, and demonstrated its utility by developing and testing a new graph transformer-based EMA method, GATE.

Findings

01

GATE ranked among top EMA methods in CASP16

02

PSBench enables rigorous evaluation of EMA techniques

03

Large, diverse dataset improves model accuracy estimation

Abstract

Predicting protein complex structures is essential for protein function analysis, protein design, and drug discovery. While AI methods like AlphaFold can predict accurate structural models for many protein complexes, reliably estimating the quality of these predicted models (estimation of model accuracy, or EMA) for model ranking and selection remains a major challenge. A key barrier to developing effective machine learning-based EMA methods is the lack of large, diverse, and well-annotated datasets for training and evaluation. To address this gap, we introduce PSBench, a benchmark suite comprising four large-scale, labeled datasets generated during the 15th and 16th community-wide Critical Assessment of Protein Structure Prediction (CASP15 and CASP16). PSBench includes over one million structural models covering a wide range of protein sequence lengths, complex stoichiometries,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bioinfomachinelearning/psbench
noneOfficial

Videos

PSBench: a large-scale benchmark for estimating the accuracy of protein complex structural models· slideslive

Taxonomy

MethodsAlphaFold