Automated Discovery of Pairwise Interactions from Unstructured Data
Zuheng (David) Xu, Moksh Jain, Ali Denton, Shawn Whitfield, Aniket, Didolkar, Berton Earnshaw, Jason Hartford

TL;DR
This paper introduces two novel interaction tests for unstructured data that enable efficient discovery of pairwise perturbation interactions, validated through biological experiments and applicable to various data types.
Contribution
It develops new interaction tests based on pairwise interventions that work on unstructured data, enhancing the discovery of biological and system interactions.
Findings
Successfully identified more known biological interactions than baselines.
Validated tests on synthetic and real biological data.
Enabled interaction detection in unstructured data like images.
Abstract
Pairwise interactions between perturbations to a system can provide evidence for the causal dependencies of the underlying underlying mechanisms of a system. When observations are low dimensional, hand crafted measurements, detecting interactions amounts to simple statistical tests, but it is not obvious how to detect interactions between perturbations affecting latent variables. We derive two interaction tests that are based on pairwise interventions, and show how these tests can be integrated into an active learning pipeline to efficiently discover pairwise interactions between perturbations. We illustrate the value of these tests in the context of biology, where pairwise perturbation experiments are frequently used to reveal interactions that are not observable from any single perturbation. Our tests can be run on unstructured data, such as the pixels in an image, which enables a…
Peer Reviews
Decision·Submitted to ICLR 2025
This paper is well-written and proposes an interesting method for active experimental design. Estimating the joint density and comparing to the marginal density is an interesting idea, both for the disjointedness analysis and for the separability analysis. Although prior work (e.g., GEARS) has formulated CRISPR pair perturbation response prediction as a matrix completion problem, the active learning formulation seems novel, especially in dealing with imaging modalities (as opposed to RNAseq).
The use of a statistical test for matrix completion is a little poorly motivated. In particular, given an n x n matrix completion problem, at least one pair is likely to have a large test statistic. Given this is an active learning problem, multiple hypothesis testing is not as core of a concern, but I'd like the authors to discuss in their rebuttal what a "null" set of interaction pairs might look like. Concretely, suppose that in a module of 50 genes (so 1225 gene pairs) only 5 gene pairs inte
- The paper addresses the problem of identifying pairwise interactions, specifically highlighting cases where the effect of two perturbations, such as cell lethality from double gene knockout, is entirely different from the effects of each perturbation alone. In the experiments, gene knockout was actually performed to validate effectiveness. - The two proposed tests are technically intriguing. Each test is well-organized with necessary assumptions and effectively leverages existing theories, inc
- The two proposed interaction tests are not compared with any standard methods. The problem in question is not new; it has a long history in statistics as the "interaction effect," where the combination of two or more factors produces an effect greater (or less) than the sum of their individual effects [1][2]. Traditional applied statistical methods (likelihood-ratio tests, two-way ANOVA, etc) have also been used for identifying synthetic lethality [3], so a comparative analysis and discussion
The proposed methodology is interesting. The main strength of this work is that it makes the biological experimental process more efficient and with lower costs. The theoretical claims are justified with mathematical proofs and the effectiveness of the algorithm is empirically validated.
I am not a specialist in the biological field of gene perturbations experiments, but based on my understanding I would point out the following potential weaknesses for the improvement of the paper. 1. It is not quantified how much the biological experiments benefit from the active learning algorithm in terms of the total number of necessary perturbations. How much more efficient is your algorithm compared to a standard exhaustive approach that would consider all possible perturbations? 2. I a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic
MethodsRandom Search
