Generalization within in silico screening
Andreas Loukas, Pan Kessel, Vladimir Gligorijevic, Richard Bonneau

TL;DR
This paper analyzes how the selectivity of predictive models in in silico screening affects their ability to generalize, emphasizing batch-level predictions over individual compound labels and providing theoretical and empirical insights.
Contribution
It extends learning theory to quantify how selection policies influence generalization in in silico screening, highlighting the importance of predicting batch properties.
Findings
Higher selectivity increases error risk in rare property detection.
Predicting the fraction of desired outcomes improves generalization.
Empirical validation across diverse tasks confirms theoretical predictions.
Abstract
In silico screening uses predictive models to select a batch of compounds with favorable properties from a library for experimental validation. Unlike conventional learning paradigms, success in this context is measured by the performance of the predictive model on the selected subset of compounds rather than the entire set of predictions. By extending learning theory, we show that the selectivity of the selection policy can significantly impact generalization, with a higher risk of errors occurring when exclusively selecting predicted positives and when targeting rare properties. Our analysis suggests a way to mitigate these challenges. We show that generalization can be markedly enhanced when considering a model's ability to predict the fraction of desired outcomes in a batch. This is promising, as the primary aim of screening is not necessarily to pinpoint the label of each compound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Computational Drug Discovery Methods
