PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary,, Sanchit Misra, Jian Tang

TL;DR
This paper introduces PDB-Struct, a comprehensive benchmark with novel metrics for evaluating structure-based protein design methods, addressing limitations of existing validation approaches and enabling fair comparison of recent algorithms.
Contribution
The paper presents a new benchmark dataset and two innovative metrics for assessing protein design methods, improving evaluation accuracy beyond traditional recovery and perplexity measures.
Findings
ByProt, ProteinMPNN, and ESM-IF perform well on the benchmark.
ESM-Design and AF-Design underperform on refoldability.
Some methods show high sequence recovery but low true foldability.
Abstract
Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a universally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the validation with recovery and perplexity metrics is efficient but may not precisely reflect true foldability. To address this gap, we introduce two novel metrics: refoldability-based metric, which leverages high-accuracy protein structure prediction models as a proxy for wet lab experiments, and stability-based metric, which assesses whether models can assign high likelihoods to experimentally stable proteins. We curate datasets from high-quality CATH protein data, high-throughput designed proteins, and mega-scale experimental mutagenesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Microbial Metabolic Engineering and Bioproduction
