DMCD: Semantic-Statistical Framework for Causal Discovery

Samarth KaPatel; Sofia Nikiforova; Giacinto Paolo Saggese; Paul Smith

arXiv:2602.20333·cs.AI·February 25, 2026

DMCD: Semantic-Statistical Framework for Causal Discovery

Samarth KaPatel, Sofia Nikiforova, Giacinto Paolo Saggese, Paul Smith

PDF

Open Access

TL;DR

DMCD is a novel two-phase causal discovery framework that combines semantic reasoning from variable metadata with statistical validation, leading to improved accuracy in real-world datasets.

Contribution

The paper introduces DMCD, a new causal discovery method that integrates large language model-based semantic drafting with statistical testing, enhancing structure learning accuracy.

Findings

01

Achieves competitive or superior performance on real-world benchmarks.

02

Significant improvements in recall and F1 score over existing methods.

03

Semantic reasoning over metadata, not memorization, drives performance gains.

Abstract

We present DMCD (DataMap Causal Discovery), a two-phase causal discovery framework that integrates LLM-based semantic drafting from variable metadata with statistical validation on observational data. In Phase I, a large language model proposes a sparse draft DAG, serving as a semantically informed prior over the space of possible causal structures. In Phase II, this draft is audited and refined via conditional independence testing, with detected discrepancies guiding targeted edge revisions. We evaluate our approach on three metadata-rich real-world benchmarks spanning industrial engineering, environmental monitoring, and IT systems analysis. Across these datasets, DMCD achieves competitive or leading performance against diverse causal discovery baselines, with particularly large gains in recall and F1 score. Probing and ablation experiments suggest that these improvements arise from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)