Insights into performance evaluation of com-pound-protein interaction prediction methods
Adiba Yaseen (1), Imran Amin (2), Naeem Akhter (1), Asa Ben-Hur (3), and Fayyaz Minhas (4) ((1) Department of Computer, Information Sciences, (DCIS), Pakistan Institute of Engineering, Applied Sciences (PIEAS),, Islamabad, Pakistan,(2) National Institute for Biotechnology

TL;DR
This study critically evaluates machine learning methods for predicting compound-protein interactions, highlighting issues in experimental design, and introduces a kernel-based approach that outperforms existing models in accuracy and real-world screening applications.
Contribution
The paper identifies key overlooked factors affecting CPI predictor performance and proposes a simple kernel-based method that surpasses state-of-the-art models in prediction accuracy.
Findings
Similarity between training and test data significantly impacts performance estimates.
Random negative example generation yields better generalization than complex strategies.
Kernel-based approach outperforms CPI-NN in predictive accuracy.
Abstract
Motivation: Machine learning based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing studies and can improve the efficiency and cost-effectiveness of wet lab assays. Despite the publication of many research papers reporting CPI predictors in the recent years, we have observed a number of fundamental issues in experiment design that lead to over optimistic estimates of model performance. Results: In this paper, we analyze the impact of several important factors affecting generalization perfor-mance of CPI predictors that are overlooked in existing work: 1. Similarity between training and test examples in cross-validation 2. The strategy for generating negative examples, in the absence of experimentally verified negative examples. 3. Choice of evaluation protocols and performance metrics and their alignment with real-world use of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Bioinformatics and Genomic Networks · Protein Structure and Dynamics
