Geometric Self-Supervised Pretraining on 3D Protein Structures using Subgraphs
Michail Chatzianastasis, Yang Zhang, George Dasoulas, Michalis, Vazirgiannis

TL;DR
This paper introduces a novel self-supervised pretraining method for 3D protein graph neural networks that leverages subgraph distances to improve protein classification performance without relying on traditional masking or augmentation techniques.
Contribution
It proposes a new pretraining scheme based on predicting distances between subgraph centroids and the global protein centroid, enhancing geometric understanding of 3D protein structures.
Findings
Achieves up to 6% performance improvement in protein classification tasks.
Eliminates the need for multiple views, augmentations, or masking strategies.
Provides a new direction for unsupervised learning in protein graph models.
Abstract
Protein representation learning aims to learn informative protein embeddings capable of addressing crucial biological questions, such as protein function prediction. Although sequence-based transformer models have shown promising results by leveraging the vast amount of protein sequence data in a self-supervised way, there is still a gap in exploiting the available 3D protein structures. In this work, we propose a pre-training scheme going beyond trivial masking methods leveraging 3D and hierarchical structures of proteins. We propose a novel self-supervised method to pretrain 3D graph neural networks on 3D protein structures, by predicting the distances between local geometric centroids of protein subgraphs and the global geometric centroid of the protein. By considering subgraphs and their relationships to the global protein structure, our model can better learn the geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Web Applications and Data Management · Image Processing and 3D Reconstruction
