Protein Fold Classification at Scale: Benchmarking and Pretraining
Dexiong Chen, Andrei Manolache, Mathias Niepert, Karsten Borgwardt

TL;DR
This paper introduces TEDBench, a large-scale benchmark for protein fold classification, and proposes MiAE, a scalable self-supervised model that outperforms existing methods on this benchmark and others.
Contribution
The paper presents TEDBench, a new large-scale, non-redundant benchmark, and introduces MiAE, a scalable self-supervised framework for protein structure representation learning.
Findings
MiAE outperforms supervised and state-of-the-art methods on TEDBench.
TEDBench is a large, non-redundant benchmark constructed from TED and AlphaFold structures.
MiAE scales well and effectively transfers to experimental structure datasets.
Abstract
Classifying protein topology is essential for deciphering biological function, but progress is held back by the lack of large-scale benchmarks that avoid duplicates and by models that do not scale well. We introduce TEDBench, a large-scale, non-redundant benchmark for protein fold classification constructed from the Encyclopedia of Domains (TED) and Foldseek-clustered AlphaFold structures. We show that on TEDBench, current protein representation learning methods either require very large models or fail to deliver strong performance. To address this challenge, we propose Masked Invariant Autoencoders (MiAE), a self-supervised framework for protein structure representation learning. MiAE uses an extremely high masking ratio of up to 90% with an -invariant encoder and a lightweight decoder that reconstructs backbone coordinates from the latent representation and mask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗TEDBench/miae-s-scmodel· 36 dl36 dl
- 🤗TEDBench/miae-b-scmodel· 32 dl32 dl
- 🤗TEDBench/miae-l-scmodel· 29 dl29 dl
- 🤗TEDBench/miae-b-seq-scmodel· 26 dl26 dl
- 🤗TEDBench/miae-s-ftmodel· 36 dl36 dl
- 🤗TEDBench/miae-b-ftmodel· 33 dl33 dl
- 🤗TEDBench/miae-l-ftmodel· 37 dl37 dl
- 🤗TEDBench/miae-b-seq-ftmodel· 29 dl29 dl
- 🤗TEDBench/miae-smodel· 33 dl33 dl
- 🤗TEDBench/miae-bmodel· 36 dl36 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
