Finding Exogenous Variables in Data with Many More Variables than Observations
Shohei Shimizu, Takashi Washio, Aapo Hyvarinen, Seiya Imoto

TL;DR
This paper introduces a new method for identifying exogenous variables in high-dimensional causal models, especially when the number of variables far exceeds the number of observations, using non-Gaussianity.
Contribution
The proposed approach efficiently finds exogenous variables in high-dimensional data without estimating the full causal structure, requiring smaller sample sizes.
Findings
Effective in p>>n scenarios with artificial data
Successfully applied to gene expression data
Outperforms traditional methods in high-dimensional settings
Abstract
Many statistical methods have been proposed to estimate causal models in classical situations with fewer variables than observations (p<n, p: the number of variables and n: the number of observations). However, modern datasets including gene expression data need high-dimensional causal modeling in challenging situations with orders of magnitude more variables than observations (p>>n). In this paper, we propose a method to find exogenous variables in a linear non-Gaussian causal model, which requires much smaller sample sizes than conventional methods and works even when p>>n. The key idea is to identify which variables are exogenous based on non-Gaussianity instead of estimating the entire structure of the model. Exogenous variables work as triggers that activate a causal chain in the model, and their identification leads to more efficient experimental designs and better understanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
