A Supervised Machine Learning Approach for Sequence Based Protein-protein Interaction (PPI) Prediction
Soumyadeep Debnath, Ayatullah Faruk Mollah

TL;DR
This paper presents a supervised machine learning method for predicting protein-protein interactions based on sequence data, emphasizing dataset quality and competitive evaluation.
Contribution
It introduces a new predictive model for PPI using sequence data, developed for the SeqPIP competition with high-quality datasets and competitive benchmarking.
Findings
Effective PPI prediction from sequence data
High accuracy on independent test datasets
Competitive performance in the SeqPIP challenge
Abstract
Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions compared to experimental approaches. Sequence is one of the key and primary information of proteins that plays a crucial role in PPI prediction. Several machine learning approaches have been applied to exploit the characteristics of PPI datasets. However, these datasets greatly influence the performance of predicting models. So, care should be taken on both dataset curation as well as design of predictive models. Here, we have described our submitted solution with the results of the SeqPIP competition whose objective was to develop comprehensive PPI predictive models from sequence information with high-quality bias-free interaction datasets. A training set of 2000 positive and 2000 negative interactions with sequences was given to us. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMicrobial Metabolic Engineering and Bioproduction · Bioinformatics and Genomic Networks · Protein Structure and Dynamics
