Sequential Causal Discovery with Noisy Language Model Priors
Prakhar Verma, David Arbour, Sunav Choudhary, Harshita Chopra, Arno Solin, Atanu R. Sinha

TL;DR
This paper introduces a hybrid causal discovery framework that integrates sequential observational data with noisy language model priors, improving accuracy and robustness in real-world scenarios.
Contribution
It proposes a novel adaptive method combining batch data and LM-derived knowledge within a PAG framework, handling biases and uncertainties effectively.
Findings
Outperforms prior methods in structural accuracy across datasets.
Extends to parameter estimation with robustness to LM noise.
Uses a sequential optimization scheme for informative edge querying.
Abstract
Causal discovery from observational data typically assumes access to complete data and availability of perfect domain experts. In practice, data often arrive in batches, are subject to sampling bias, and expert knowledge is scarce. Language Models (LMs) offer a surrogate for expert knowledge but suffer from hallucinations, inconsistencies, and bias. We present a hybrid framework that bridges these gaps by adaptively integrating sequential batch data with LM-derived noisy, expert knowledge while accounting for both data-induced and LM-induced biases. We propose a representation shift from Directed Acyclic Graph (DAG) to Partial Ancestral Graph (PAG), that accommodates ambiguities within a coherent framework, allowing grounding the global LM knowledge in local observational data. To guide LM interactions, we use a sequential optimization scheme that adaptively queries the most informative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
