Reliable OOD Virtual Screening with Extrapolatory Pseudo-Label Matching
Yunni Qu (1), Bhargav Vaduri (1), Karthikeya Jatoth (1), James Wellnitz (2), Dzung Dinh (1), Seth Veenbaas (2), Jonathan Chapman (2), Alexander Tropsha (2), Junier Oliva (1) ((1) Department of Computer Science, University of North Carolina at Chapel Hill

TL;DR
This paper introduces EXPLOR, a novel framework for virtual screening that improves out-of-distribution chemical space extrapolation and confidence estimation, enabling more reliable identification of promising drug candidates with minimal data.
Contribution
EXPLOR employs extrapolatory pseudo-labeling and a multi-headed architecture to enhance OOD extrapolation and confidence reliability in virtual screening without needing unlabeled test data.
Findings
State-of-the-art performance on chemical benchmarks
Effective in high-confidence regions for candidate selection
Robust OOD extrapolation with minimal training data
Abstract
Machine learning (ML) models are increasingly deployed for virtual screening in drug discovery, where the goal is to identify novel, chemically diverse scaffolds while minimizing experimental costs. This creates a fundamental challenge: the most valuable discoveries lie in out-of-distribution (OOD) regions beyond the training data, yet ML models often degrade under distribution shift. Standard novelty-rejection strategies ensure reliability within the training domain but limit discovery by rejecting precisely the novel scaffolds most worth finding. Moreover, experimental budgets permit testing only a small fraction of nominated candidates, demanding models that produce reliable confidence estimates. We introduce EXPLOR (Extrapolatory Pseudo-Label Matching for OOD Uncertainty-Based Rejection), a framework that addresses both challenges through extrapolatory pseudo-labeling on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Forecasting Techniques and Applications · Air Traffic Management and Optimization
MethodsSparse Evolutionary Training · Balanced Selection
