Differentiable Causal Discovery For Latent Hierarchical Causal Models
Parjanya Prashant, Ignavier Ng, Kun Zhang, Biwei Huang

TL;DR
This paper introduces a novel differentiable causal discovery method for nonlinear latent hierarchical models, overcoming previous limitations in scalability and assumptions, and demonstrating superior performance on high-dimensional data.
Contribution
It provides the first differentiable causal discovery algorithm for nonlinear latent hierarchical models, with new theoretical identifiability results and practical high-dimensional applications.
Findings
Outperforms existing methods in accuracy and scalability.
Successfully learns interpretable hierarchical latent structures from image data.
Effective on downstream tasks with high-dimensional data.
Abstract
Discovering causal structures with latent variables from observational data is a fundamental challenge in causal discovery. Existing methods often rely on constraint-based, iterative discrete searches, limiting their scalability to large numbers of variables. Moreover, these methods frequently assume linearity or invertibility, restricting their applicability to real-world scenarios. We present new theoretical results on the identifiability of nonlinear latent hierarchical causal models, relaxing previous assumptions in literature about the deterministic nature of latent variables and exogenous noise. Building on these insights, we develop a novel differentiable causal discovery algorithm that efficiently estimates the structure of such models. To the best of our knowledge, this is the first work to propose a differentiable causal discovery method for nonlinear latent hierarchical…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper is, to the best of my knowledge, the first to provide identifiability results for nonlinear latent hierarchical causal models. The proof technique seems correct to me, though I did not check it thoroughly (for example, the appendix). 2. Estimating equation 9 using Donsker-Varadhan representation is novel.
1. **Experimental limitations**: a. **Synthetic experiments**: Instead of experimenting on just 4 structures given in figure 3, I would encourage authors to randomly generate DAGs and run experiments on these structures. For the synthetic experiments, the analysis would be stronger if the authors also try nonlinear activations for eq 1, instead of piecewise linear activation such as LeakyRELU. b. **Real experiments**: The baselines for the experiments on CMNIST are VAE and $\beta$-VAE -
- I thought this was interesting, original work. The class of graphs that they study is obviously limited but seems practical & the rank condition is intuitive. - The paper is very well written - both the theory and methods section do a good job of explaining the intuition for why the method works - The empirical results are strong on the datasets that they tested.
* The coloured MNIST results appear very strong (though this is not my area), but not contextualized in the domain generalization literature. I would have at least expected you to report the published numbers from recent work from that setting. Autoencoders & Beta-VAE is not the right baselines? * I would have liked a more detailed discussion of the learned MNIST graph. I am not sure what to make of figure 4 or table 3 in the appendix? Do those latents make sense? Is there a natural hierarchical
I really like that the evaluation is not just done with respect to a causal metric but wrt to "a regression classifier trained on the learned representation". If the causality field would move towards the standard evaluation practices of deep learning progress would be faster and this paper is one of the few which actually does perform this evaluation! However, when reading the paper in more detail e.g. Table 1 is then again evaluated wrt to discovery metrics only table 2 is evaluated with a le
The key claimed advantage for better identifiability results comes from the fact that instead it is assumed that "not yet account for structures where measured variables have children" There is some exchangeability of these assumptions and in that sense I agree that the current assumption is a more practical one but it is not a novel one or a clear contribution until a clear relation between the assumptions is shown. The evaluation is really lacking wrt to datasets and shown clear benefits ac
Videos
Taxonomy
TopicsBayesian Modeling and Causal Inference · Semantic Web and Ontologies · Data Quality and Management
