A Meta-Learning Approach to Bayesian Causal Discovery

Anish Dhir; Matthew Ashman; James Requeima; Mark van der Wilk

arXiv:2412.16577·cs.LG·March 6, 2025

A Meta-Learning Approach to Bayesian Causal Discovery

Anish Dhir, Matthew Ashman, James Requeima, Mark van der Wilk

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a Bayesian meta-learning model that effectively samples from the causal structure posterior, capturing key properties and improving over existing methods in Bayesian causal discovery.

Contribution

It presents a novel meta-learning approach that encodes correlation and permutation invariance, enabling reliable sampling from the causal structure posterior.

Findings

01

Outperforms existing Bayesian causal discovery methods

02

Successfully samples from the causal structure posterior

03

Encodes key properties like correlation and permutation invariance

Abstract

Discovering a unique causal structure is difficult due to both inherent identifiability issues, and the consequences of finite data. As such, uncertainty over causal structures, such as those obtained from a Bayesian posterior, are often necessary for downstream tasks. Finding an accurate approximation to this posterior is challenging, due to the large number of possible causal graphs, as well as the difficulty in the subproblem of finding posteriors over the functional relationships of the causal edges. Recent works have used meta-learning to view the problem of estimating the maximum a-posteriori causal graph as supervised learning. Yet, these methods are limited when estimating the full posterior as they fail to encode key properties of the posterior, such as correlation between edges and permutation equivariance with respect to nodes. Further, these methods also cannot reliably…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. This paper provides a well-structured and self-contained summary of prior works in Bayesian causal discovery, allowing readers to clearly follow the evolution of Bayesian approaches for learning causal graphs. As a reviewer, this summary enables me to track the advancements in the field and understand the recent progress in addressing challenges like scalability and uncertainty in causal inference. 2. The paper clearly outlines its unique contributions in Table 1, highlighting significant ad

Weaknesses

1. One of the main limitations of this paper is the assumption that there are no latent confounders between variables. This assumption may limit the applicability of the method to real-world datasets where unobserved confounders are common. 2. I think this method is heavily relying on the assumptions on the choice of modeling class of functions F, the types of graphs and the noise model. Unlike fully nonparametric approaches like the PC algorithm, which do not impose strict functional or noise

Reviewer 02Rating 6Confidence 4

Strengths

- As summarized in Table 1, this approach implements several key desiderata for a Bayesian meta-learning model more effectively than existing alternatives. - The paper is well-written, allowing readers to easily follow the motivation and desiderata of Bayesian meta-learning models.

Weaknesses

- While I appreciate the overall architecture of the model, it is challenging to identify which components provide truly novel technical contributions. For instance, the results on pages 4 and 5 appear to largely rely on existing findings, and, if I'm not mistaken, page 6 includes results from Annadani et al. (2024). The contributions of this paper are not clearly or explicitly articulated. - The Bayesian prior is learned through pairs of datasets and directed acyclic graphs (DAGs). From my pers

Reviewer 03Rating 6Confidence 2

Strengths

The paper proposes a technique that can directly sample from the posterior over graphs. The main contribution of the paper is in the architecture of the encoder-decoder network that captures necessary properties like the permutation invariance by cross-attention, and ayclic DAGs and edge dependencies by sampling the decomposition of a DAG into permutation and lower-triangular matrices. The comparison of the proposed model is done with existing meta-learning approaches and Bayesian approaches usi

Weaknesses

My main concern is that of this being an incremental contribution. The framework is not new and the only contributions I see are a change in how permutation invariance and acyclicity is incorporated. Please let me know if I am missing something else. I have a detailed list of questions below whose answers might help strengthen the paper.

Code & Models

Repositories

Anish144/CausalStructureNeuralProcess
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference