Multi-Scale Representation Learning on Proteins
Vignesh Ram Somnath, Charlotte Bunne, Andreas Krause

TL;DR
This paper presents a multi-scale graph-based approach for protein representation learning, capturing surface, structure, and sequence details, leading to improved performance on ligand binding and function prediction tasks with enhanced efficiency.
Contribution
It introduces a novel multi-scale graph construction for proteins, integrating surface, structure, and sequence information, and demonstrates its effectiveness on key biological prediction tasks.
Findings
Outperforms baselines on ligand binding affinity regression.
Achieves near top performance on protein function prediction with fewer parameters.
Uses superpixels to improve memory efficiency with minimal performance loss.
Abstract
Proteins are fundamental biological entities mediating key roles in cellular function and disease. This paper introduces a multi-scale graph construction of a protein -- HoloProt -- connecting surface to structure and sequence. The surface captures coarser details of the protein, while sequence as primary component and structure -- comprising secondary and tertiary components -- capture finer details. Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level. We test the learned representation on different tasks, (i.) ligand binding affinity (regression), and (ii.) protein function prediction (classification). On the regression task, contrary to previous methods, our model performs consistently and reliably across different dataset splits, outperforming all baselines on most splits. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Protein Structure and Dynamics · Machine Learning in Bioinformatics
