Large-Scale Targeted Cause Discovery via Learning from Simulated Data
Jang-Hyun Kim, Claudia Skok Gibbs, Sangdoo Yun, Hyun Oh Song, Kyunghyun Cho

TL;DR
This paper introduces a scalable machine learning method that efficiently identifies causal factors of a target variable in large systems by learning from simulated data, bypassing full causal graph reconstruction.
Contribution
It presents a novel neural network-based approach that directly infers causal variables with linear complexity, suitable for large-scale systems like gene regulatory networks.
Findings
Outperforms existing methods in large-scale gene network causal discovery
Demonstrates strong generalization across different graph structures and mechanisms
Scales efficiently to thousands of variables
Abstract
We propose a novel machine learning approach for inferring causal variables of a target variable from observations. Our focus is on directly inferring a set of causal factors without requiring full causal graph reconstruction, which is computationally challenging in large-scale systems. The identified causal set consists of all potential regulators of the target variable under experimental settings, enabling efficient regulation through intervention. To achieve this, we train a neural network using supervised learning on simulated data to infer causality. By employing a subsampled-ensemble inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables. Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks, outperforming existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data-Driven Disease Surveillance · Biomedical Text Mining and Ontologies
MethodsSparse Evolutionary Training · Focus
