Competition is the key: A Game Theoretic Causal Discovery Approach

Amartya Roy; Souvik Chakraborty

arXiv:2510.20106·cs.LG·October 24, 2025

Competition is the key: A Game Theoretic Causal Discovery Approach

Amartya Roy, Souvik Chakraborty

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a game-theoretic reinforcement learning method for causal discovery that guarantees finite-sample correctness, scales well, and outperforms existing algorithms on synthetic and real datasets.

Contribution

It presents a novel RL-based causal discovery framework with provable guarantees, bridging the gap between empirical performance and theoretical assurances.

Findings

01

Achieves high-probability correctness in selecting the true graph

02

Scales to large graphs with up to 220 nodes

03

Consistently outperforms GES and GraN-DAG on benchmarks

Abstract

Causal discovery remains a central challenge in machine learning, yet existing methods face a fundamental gap: algorithms like GES and GraN-DAG achieve strong empirical performance but lack finite-sample guarantees, while theoretically principled approaches fail to scale. We close this gap by introducing a game-theoretic reinforcement learning framework for causal discovery, where a DDQN agent directly competes against a strong baseline (GES or GraN-DAG), always warm-starting from the opponent's solution. This design yields three provable guarantees: the learned graph is never worse than the opponent, warm-starting strictly accelerates convergence, and most importantly, with high probability the algorithm selects the true best candidate graph. To the best of our knowledge, our result makes a first-of-its-kind progress in explaining such finite-sample guarantees in causal discovery: on…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

-Tackling theoretical questions in Reinforcement Learning (RL) is a significant and sparse area of research, and the paper's engagement with this aspect is a clear strength. This focus on theoretical underpinnings helps to advance the field by providing a more robust foundation for future work in RL for causal discovery. -It's a promising idea to enhance RL for causal discovery by initiating the process with a graph determined by another algorithm. As the authors highlight in their theoretical

Weaknesses

-While the algorithm effectively combines existing elements, its originality is somewhat constrained by the straightforward application of established concepts. -While the paper touches upon foundational and some contemporary literature, a more comprehensive exploration of recent advancements—such as GFlow Nets [1], RL for causal discovery [2], and other state-of-the-art methods [3]—would greatly enhance the understanding of its impact and relevance within the field. -Given the inherent desig

Reviewer 02Rating 2Confidence 4

Strengths

The game-theoretic framing of causal discovery as RL-guided refinement of classical baselines is novel, and providing finite-sample guarantees for RL-based causal discovery addresses an important gap between empirically strong but theoretically ungrounded methods. The experiments span a large variety of real-world benchmarks of different sizes.

Weaknesses

Main Concerns: 1. Unclear necessity of RL: Why is RL specifically needed for refinement rather than any other iterative improvement method? The theoretical proofs do not seem to explicitly leverage properties unique to RL algorithms, raising questions about what the RL component contributes beyond a standard local search procedure. Authors do not make clear what the added benefit of leveraging an RL approach is. 2. Marginal empirical improvements: For Asia, Sachs, Lucas, and Hepar2, there is n

Reviewer 03Rating 2Confidence 4

Strengths

The paper propose a causal discovery algorithm with reinforcement learning. Their method comes with some theoretical guarantees. I found particularly valuable having high probability results with finite samples, something that is often overlooked in the literature, which mostly focuses on theoretical guarantees at population level. Moreover the authors produced a nice collection of real world datasets over which to benchmark their algorithm. This strengthen the value of their empirical analysis,

Weaknesses

**Presentation and clarity.** - The paper lacks any technical introduction to reinforcement learning, graph theory, and the problem of causal discovery. These are the key ingredients of the inference problem at hand and the algorithm the authors design to solve this problem. Readability would be positively impacted if the terminology used in the paper were appropriately introduced. - I am a bit confused by the title in relation to the content of the paper: where is the *game-theoretic* part? M

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks