Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)
Ankit Hemant Lade, Sai Krishna Jasti, Indar Kumar, Aman Chadha

TL;DR
The paper critically evaluates claims that prediction bottlenecks reveal causal structure, showing that simpler methods perform as well or better and that observed effects are confounded by sample size and intervention schemes.
Contribution
It provides a standardized falsification benchmark to test causal claims from prediction models and demonstrates that many purported causal signals are confounded or method-agnostic.
Findings
Linear bottlenecks perform as well or better than complex models.
Tuned Lasso outperforms bottlenecks on synthetic and real benchmarks.
Intervention effects are largely explained by sample size and standard interventions.
Abstract
A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout , with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at . We package the protocol used to test that claim -- standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics (, soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms -- as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
