TL;DR
PSBench is a comprehensive benchmark dataset with over one million protein complex models, designed to improve the estimation of model accuracy in protein structure prediction, facilitating better model ranking and selection.
Contribution
We created PSBench, a large-scale, well-annotated dataset for training and evaluating EMA methods, and demonstrated its utility by developing and testing a new graph transformer-based EMA method, GATE.
Findings
GATE ranked among top EMA methods in CASP16
PSBench enables rigorous evaluation of EMA techniques
Large, diverse dataset improves model accuracy estimation
Abstract
Predicting protein complex structures is essential for protein function analysis, protein design, and drug discovery. While AI methods like AlphaFold can predict accurate structural models for many protein complexes, reliably estimating the quality of these predicted models (estimation of model accuracy, or EMA) for model ranking and selection remains a major challenge. A key barrier to developing effective machine learning-based EMA methods is the lack of large, diverse, and well-annotated datasets for training and evaluation. To address this gap, we introduce PSBench, a benchmark suite comprising four large-scale, labeled datasets generated during the 15th and 16th community-wide Critical Assessment of Protein Structure Prediction (CASP15 and CASP16). PSBench includes over one million structural models covering a wide range of protein sequence lengths, complex stoichiometries,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsAlphaFold
