General Protein Pretraining or Domain-Specific Designs? Benchmarking Protein Modeling on Realistic Applications
Shuo Yan, Yuliang Yan, Bin Ma, Chenao Li, Haochun Tang, Jiahua Lu, Minhua Lin, Yuyuan Feng, Enyan Dai

TL;DR
This paper introduces Protap, a benchmark for evaluating protein modeling methods across diverse real-world applications, highlighting the effectiveness of structural information and domain-specific priors over large-scale pretraining.
Contribution
Protap systematically compares various architectures and strategies on multiple realistic protein tasks, including novel industrially relevant applications.
Findings
Large-scale pretraining often underperforms supervised models on small datasets.
Structural information during fine-tuning can rival large pretraining models.
Domain-specific priors improve performance on specialized tasks.
Abstract
Recently, extensive deep learning architectures and pretraining strategies have been explored to support downstream protein applications. Additionally, domain-specific models incorporating biological knowledge have been developed to enhance performance in specialized tasks. In this work, we introduce , a comprehensive benchmark that systematically compares backbone architectures, pretraining strategies, and domain-specific models across diverse and realistic downstream protein applications. Specifically, Protap covers five applications: three general tasks and two novel specialized tasks, i.e., enzyme-catalyzed protein cleavage site prediction and targeted protein degradation, which are industrially relevant yet missing from existing benchmarks. For each application, Protap compares various domain-specific models and general architectures under multiple pretraining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Advanced Graph Neural Networks
