Demystifying amortized causal discovery with transformers

Francesco Montagna; Max Cairney-Leeming; Dhanya Sridhar; Francesco Locatello

arXiv:2405.16924·cs.LG·March 19, 2026

Demystifying amortized causal discovery with transformers

Francesco Montagna, Max Cairney-Leeming, Dhanya Sridhar, Francesco Locatello

PDF

Open Access

TL;DR

This paper analyzes how transformer-based amortized causal discovery methods implicitly rely on priors and identifiability, revealing their limitations in generalizing to unseen causal models and aligning with classical theory.

Contribution

It bridges the gap between amortized causal discovery and identifiability theory, and analyzes conditions under which training on multiple models improves generalization.

Findings

01

Training distribution defines a prior on causal models.

02

CSIvA cannot generalize to unseen causal classes without multiple models.

03

Amortized methods still adhere to identifiability constraints.

Abstract

Supervised learning for causal discovery from observational data often achieves competitive performance despite seemingly avoiding the explicit assumptions that traditional methods require for identifiability. In this work, we analyze CSIvA (Ke et al., 2023) on bivariate causal models, a transformer architecture for amortized inference promising to train on synthetic data and transfer to real ones. First, we bridge the gap with identifiability theory, showing that the training distribution implicitly defines a prior on the causal model of the test observations: consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. Second, we find that CSIvA can not generalize to classes of causal models unseen during training: to overcome this limitation, we theoretically and empirically analyze…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management