Causal Graph Learning via Distributional Invariance of Cause-Effect Relationship
Nang Hung Nguyen, Phi Le Nguyen, Thao Nguyen Truong, Trong Nghia Hoang, Masashi Sugiyama

TL;DR
This paper proposes a novel causal graph learning method based on the invariance of effect distributions conditioned on causes across different prior cause distributions, enabling efficient and scalable causal discovery from observational data.
Contribution
It introduces a distributional invariance-based framework and an efficient algorithm for causal graph recovery with improved scalability and competitive accuracy.
Findings
Outperforms existing methods in scalability, reducing processing time by up to 25x.
Achieves superior or comparable accuracy on large-scale benchmark datasets.
Utilizes invariance of effect distributions to identify causal relationships effectively.
Abstract
This paper introduces a new framework for recovering causal graphs from observational data, leveraging the observation that the distribution of an effect, conditioned on its causes, remains invariant to changes in the prior distribution of those causes. This insight enables a direct test for potential causal relationships by checking the variance of their corresponding effect-cause conditional distributions across multiple downsampled subsets of the data. These subsets are selected to reflect different prior cause distributions, while preserving the effect-cause conditional relationships. Using this invariance test and exploiting an (empirical) sparsity of most causal graphs, we develop an algorithm that efficiently uncovers causal relationships with quadratic complexity in the number of observational variables, reducing the processing time by up to 25x compared to state-of-the-art…
Peer Reviews
Decision·Submitted to ICLR 2025
The paper is well written. The figure is helpful for understanding the algorithm. The experiments are extensive.
## Minor weaknesses: * The authors mentioned “finding the maximum clique in an augmented bidirectional graph” multiple times but without a proper definition or example/visualization. * The source variables should be defined in a little more detail. * What does $P'$ in equation 2 refer to? It should be precise. * “The intuition is if we can re-sample $D_i$ from $D \sim P(X)$ such that $D_i \sim P_i(X)$,” This is a little unclear. How are $D \sim P(\mathbf{X})$ and $D_i \sim P_i(\mathbf{X})$ diffe
- This paper is written well, with clear descriptions and motivations. - The authors propose practical algorithms for causal discovery, with some interesting theoretical findings, e.g., the basis of a DAG, the minimal downsampling rate, etc. - The experiments under synthetic datasets and real-world networks are extensive, which verified the advantages in large-scale datasets.
- Some details seem to be missing in the paper. For example, i) Footnote 2 and Theorem 1 tell how to find non-parent sets, whereas how to set the threshold for the variance is not clear. Please give the details in the paper. ii) How to learn the different priors $P_i(X)$, with the estimated $m$? Did the authors assume some distributions? - Theorem 2 provides a necessary condition to test whether a subset $Z$ is the parent set of X. However, it is not a sufficient condition. Although the auth
1. The proposed method is rather novel. 2. Overall, the paper is well-structured and clearly written. 3. The experiments are extensive, covering 3 types of functional causal models, 6 causal discovery baseline methods, and varying graph sizes.
Any thoughts on how to extend your method to handle heterogeneous or time-series datasets?
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Advanced Causal Inference Techniques · Explainable Artificial Intelligence (XAI)
