Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments

Andrea Rubbi; Arpit Merchant; Samuel Ogden; Amir Akbarnejad; Pietro Li\`o; Sattar Vakili; Mo Lotfollahi

arXiv:2605.10196·cs.LG·May 12, 2026

Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments

Andrea Rubbi, Arpit Merchant, Samuel Ogden, Amir Akbarnejad, Pietro Li\`o, Sattar Vakili, Mo Lotfollahi

PDF

TL;DR

This paper introduces a Bayesian optimization-based method called Probability-of-Hit for efficient hit discovery in gene perturbation experiments, outperforming traditional approaches by focusing on threshold exceedance.

Contribution

It formalizes hit discovery as a sequential design problem and proposes a novel acquisition function with proven asymptotic optimality.

Findings

01

Up to 6.4% improvement over baselines on biological datasets

02

Strong empirical performance on synthetic and real data

03

Asymptotic optimality of the proposed method

Abstract

High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.