PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction
Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov,, and Katherine Yelick

TL;DR
PersGNN is a novel deep learning model that combines topological data analysis and geometric deep learning to improve structure-based protein function prediction, outperforming existing methods.
Contribution
This work introduces PersGNN, a hybrid model that integrates topological and geometric features for enhanced protein function prediction from structure data.
Findings
PersGNN achieves a 9.3% increase in AUPR over baseline models.
High F1 scores demonstrate transferability across gene ontology categories.
Hybrid approach outperforms individual techniques in structure-based prediction.
Abstract
Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structure-based prediction schemes. We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Topological and Geometric Data Analysis · Machine Learning in Bioinformatics
