Data-centric challenges with the application and adoption of artificial intelligence for drug discovery
Ghita Ghislat, Saiveth Hernandez-Hernandez, Chayanit Piyawajanusorn,, Pedro J. Ballester

TL;DR
This paper discusses the various data-centric challenges in applying AI to drug discovery, including data quality issues, evaluation pitfalls, and human biases, and suggests mitigation strategies to improve AI model reliability and prospective utility.
Contribution
It provides a comprehensive overview of data-related challenges in AI-driven drug discovery and highlights effective mitigation approaches and the importance of prospective validation.
Findings
Data issues like bias and high dimensionality hinder AI performance.
Uncertainty quantification techniques face challenges in reliable prediction.
Many AI models lack prospective validation and real-world applicability.
Abstract
Introduction: Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges currently limiting the impact and scope of AI models. Areas covered: In this perspective, the authors discuss a range of data issues (bias, inconsistency, skewness, irrelevance, small size, high dimensionality), how they challenge AI models, and which issue-specific mitigations have been effective. Next, they point out the challenges faced by uncertainty quantification techniques aimed at enhancing and trusting the predictions from these AI models. They also discuss how conceptual errors, unrealistic benchmarks and performance misestimation can confound the evaluation of models and thus their development. Lastly, the authors explain how human bias, whether from AI experts or drug discovery experts,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Genetics, Bioinformatics, and Biomedical Research
