Machine Learning for Protein Function
Dan Ofer

TL;DR
This paper develops machine learning-based feature engineering methods and tools like NeuroPID and ProFET for identifying and classifying proteins with similar functions but low sequence similarity, enabling large-scale discovery and analysis.
Contribution
Introduces novel feature engineering frameworks and tools for protein classification, improving identification of functionally related proteins with minimal sequence similarity.
Findings
ProFET achieves state-of-the-art performance on benchmark datasets.
NeuroPID enables mass discovery of neuropeptides and NPPs.
Tools are applicable to diverse high-level protein functions.
Abstract
Systematic identification of protein function is a key problem in current biology. Most traditional methods fail to identify functionally equivalent proteins if they lack similar sequences, structural data or extensive manual annotations. In this thesis, I focused on feature engineering and machine learning methods for identifying diverse classes of proteins that share functional relatedness but little sequence or structural similarity, notably, Neuropeptide Precursors (NPPs). I aim to identify functional protein classes solely using unannotated protein primary sequences from any organism. This thesis focuses on feature representations of whole protein sequences, sequence derived engineered features, their extraction, frameworks for their usage by machine learning (ML) models, and the application of ML models to biological tasks, focusing on high level protein functions. I implemented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Advanced Proteomics Techniques and Applications
