Evaluating Protein Transfer Learning with TAPE
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen,, John Canny, Pieter Abbeel, and Yun S. Song

TL;DR
This paper introduces TAPE, a benchmark suite of five biologically relevant semi-supervised tasks for protein modeling, demonstrating that self-supervised pretraining improves performance but still lags behind traditional methods, highlighting opportunities for better architectures.
Contribution
The paper presents TAPE, a standardized benchmark for evaluating protein embeddings across diverse tasks, and provides a comprehensive analysis of semi-supervised learning approaches in protein modeling.
Findings
Self-supervised pretraining improves performance on all tasks.
Pretraining can more than double model performance.
Current learned features often underperform compared to traditional methods.
Abstract
Protein modeling is an increasingly popular area of machine learning research. Semi-supervised learning has emerged as an important paradigm in protein modeling due to the high cost of acquiring supervised protein labels, but the current literature is fragmented when it comes to datasets and standardized evaluation techniques. To facilitate progress in this field, we introduce the Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. We curate tasks into specific training, validation, and test splits to ensure that each task tests biologically relevant generalization that transfers to real-life scenarios. We benchmark a range of approaches to semi-supervised protein representation learning, which span recent work as well as canonical sequence learning techniques. We find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · RNA and protein synthesis mechanisms · Protein Structure and Dynamics
