Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments
Andrea Rubbi, Arpit Merchant, Samuel Ogden, Amir Akbarnejad, Pietro Li\`o, Sattar Vakili, Mo Lotfollahi

TL;DR
This paper introduces a Bayesian optimization-based method called Probability-of-Hit for efficient hit discovery in gene perturbation experiments, outperforming traditional approaches by focusing on threshold exceedance.
Contribution
It formalizes hit discovery as a sequential design problem and proposes a novel acquisition function with proven asymptotic optimality.
Findings
Up to 6.4% improvement over baselines on biological datasets
Strong empirical performance on synthetic and real data
Asymptotic optimality of the proposed method
Abstract
High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
