Inference for Hit Enrichment Curves, with Applications to Drug Discovery
Jeremy R. Ash, Jacqueline M. Hughes-Oliver

TL;DR
This paper develops statistical methods for assessing the uncertainty in hit enrichment curves used in drug discovery, enabling more reliable comparisons of ranking algorithms across entire curves and specific points.
Contribution
It introduces new inferential procedures, including hypothesis tests and confidence bands, that account for correlation structures in hit enrichment curves, improving analysis accuracy.
Findings
EmProc method outperforms others in pointwise inference
Confidence bands provide reliable simultaneous coverage
Procedures extend to enrichment factors
Abstract
In virtual screening for drug discovery, hit enrichment curves are widely used to assess the performance of ranking algorithms with regard to their ability to identify early enrichment. Unfortunately, researchers almost never consider the uncertainty associated with estimating such curves before declaring differences between performance of competing algorithms. Appropriate inference is complicated by two sources of correlation that are often overlooked: correlation across different testing fractions within a single algorithm, and correlation between competing algorithms. Additionally, researchers are often interested in making comparisons along the entire curve, not only at a few testing fractions. We develop inferential procedures to address both the needs of those interested in a few testing fractions, as well as those interested in the entire curve. For the former, four hypothesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Statistical Methods in Clinical Trials · Statistical Methods and Inference
