Scaffold Splits Overestimate Virtual Screening Performance
Qianrong Guo, Saiveth Hernandez-Hernandez, and Pedro J Ballester

TL;DR
This study demonstrates that scaffold splits, commonly used in virtual screening model evaluation, overestimate performance by creating overly similar training and test sets, highlighting the need for more realistic data splitting methods.
Contribution
The paper reveals that scaffold splits overestimate virtual screening performance and advocates for more realistic splitting methods like UMAP clustering.
Findings
Model performance drops significantly with UMAP splits.
Scaffold splits create unrealistic similarities between training and test sets.
More realistic data splits are essential for accurate model evaluation.
Abstract
Virtual Screening (VS) of vast compound libraries guided by Artificial Intelligence (AI) models is a highly productive approach to early drug discovery. Data splitting is crucial for better benchmarking of such AI models. Traditional random data splits produce similar molecules between training and test sets, conflicting with the reality of VS libraries which mostly contain structurally distinct compounds. Scaffold split, grouping molecules by shared core structure, is widely considered to reflect this real-world scenario. However, here we show that the scaffold split also overestimates VS performance. The reason is that molecules with different chemical scaffolds are often similar, which hence introduces unrealistically high similarities between training molecules and test molecules following a scaffold split. Our study examined three representative AI models on 60 NCI-60 datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMolecular Biology Techniques and Applications
