Comparing two deep learning sequence-based models for protein-protein interaction prediction
Florian Richoux, Charl\`ene Servantie, Cynthia Bor\`es, St\'ephane, T\'eletch\'ea

TL;DR
This paper compares two deep learning models for protein-protein interaction prediction, highlighting pitfalls like overfitting and information leaks, and demonstrates a model achieving over 78% accuracy on human PPI data.
Contribution
It introduces a careful comparison of deep learning models for PPI prediction and emphasizes methodological rigor to avoid common pitfalls, enabling scalable and reliable predictions.
Findings
Best model predicts over 78% of human PPIs
Highlights importance of avoiding overfitting and information leaks
Methodology supports scaling to larger datasets
Abstract
Biological data are extremely diverse, complex but also quite sparse. The recent developments in deep learning methods are offering new possibilities for the analysis of complex data. However, it is easy to be get a deep learning model that seems to have good results but is in fact either overfitting the training data or the validation data. In particular, the fact to overfit the validation data, called "information leak", is almost never treated in papers proposing deep learning models to predict protein-protein interactions (PPI). In this work, we compare two carefully designed deep learning models and show pitfalls to avoid while predicting PPIs through machine learning methods. Our best model predicts accurately more than 78% of human PPI, in very strict conditions both for training and testing. The methodology we propose here allow us to have strong confidences about the ability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Computational Drug Discovery Methods · Protein Structure and Dynamics
