ProtBoost: protein function prediction with Py-Boost and Graph Neural Networks -- CAFA5 top2 solution
Alexander Chervov, Anton Vakhrushev, Sergei Fironov, Loredana, Martignetti

TL;DR
ProtBoost combines pretrained protein language models, gradient boosting, and Graph Neural Networks to improve protein function prediction, achieving second place in the CAFA5 challenge and demonstrating the effectiveness of hierarchical and graph-based modeling.
Contribution
The paper introduces ProtBoost, a novel method integrating GCNs, gradient boosting, and protein language models for enhanced protein function prediction, with scalable multi-target capabilities.
Findings
Ranked second in CAFA5 challenge among 1600+ participants.
Py-Boost can predict thousands of targets simultaneously.
Graph-based stacking significantly boosts prediction performance.
Abstract
Predicting protein properties, functions and localizations are important tasks in bioinformatics. Recent progress in machine learning offers an opportunities for improving existing methods. We developed a new approach called ProtBoost, which relies on the strength of pretrained protein language models, the new Py-Boost gradient boosting method and Graph Neural Networks (GCN). The ProtBoost method was ranked second best model in the recent Critical Assessment of Functional Annotation (CAFA5) international challenge with more than 1600 participants. Py-Boost is the first gradient boosting method capable of predicting thousands of targets simultaneously, making it an ideal fit for tasks like the CAFA challange. Our GCN-based approach performs stacking of many individual models and boosts the performance significantly. Notably, it can be applied to any task where targets are arranged in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics
MethodsOntology
