Multi-Agent Causal Discovery Using Large Language Models

Hao Duong Le; Xin Xia; Zhang Chen

arXiv:2407.15073·cs.AI·February 25, 2025·5 cites

Multi-Agent Causal Discovery Using Large Language Models

Hao Duong Le, Xin Xia, Zhang Chen

PDF

Open Access 5 Reviews

TL;DR

This paper introduces MAC, a multi-agent framework utilizing large language models to improve causal discovery by integrating structured data and metadata, outperforming existing methods across multiple datasets.

Contribution

The paper presents a novel multi-agent causal discovery framework that combines debating and coding modules with meta-fusion, leveraging LLMs to enhance causal inference from complex data and metadata.

Findings

01

MAC outperforms traditional causal discovery methods.

02

MAC achieves state-of-the-art results on five datasets.

03

The framework effectively integrates structured data and metadata.

Abstract

Causal discovery aims to identify causal relationships between variables and is a critical research area in machine learning. Traditional methods focus on statistical or machine learning algorithms to uncover causal links from structured data, often overlooking the valuable contextual information provided by metadata. Large language models (LLMs) have shown promise in creating unified causal discovery frameworks by incorporating both structured data and metadata. However, their potential in multi-agent settings remains largely unexplored. To address this gap, we introduce the Multi-Agent Causal Discovery Framework (MAC), which consists of two key modules: the Debate-Coding Module (DCM) and the Meta-Debate Module (MDM). The DCM begins with a multi-agent debating and coding process, where agents use both structured data and metadata to collaboratively select the most suitable statistical…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 1Confidence 3

Strengths

The idea of using multi-agent systems to solve causal discovery problem is interesting.

Weaknesses

### General I found it a bit hard to assess the significance of this work, partly because the writing (especially on the experimental setup) is a bit opaque, and I did not find the result to be super exciting. The constant stylish and grammatical imperfections also seem to suggest this paper was written in a rush. I would also recommended the authors proofread the paper and make the writing clearer. Furthermore, mechanically, I do not see how using a multi-agent setup is fundemantally better. I

Reviewer 02Rating 3Confidence 4

Strengths

1. Comprehensive Multi-Agent Approach: The paper introduces a well-structured framework that incorporates debate, planning, and coding agents to address causal inference, allowing MAC to explore causal discovery from multiple angles (e.g., reasoning and statistical analysis). 2. Thorough Evaluation Metrics: The study uses detailed metrics (SHD, NHD) across three datasets with varying complexities, providing an empirical basis for evaluating each model’s effectiveness. This comparison offers val

Weaknesses

1. Lack of Formal Framework for Agentic Workflow Convergence: Although MAC leverages multi-agent interactions for causal discovery, it lacks a formal mathematical model that defines or ensures convergence of the agents' outputs to a stable causal graph. In high-dimensional causal graphs, where causal inference is combinatorially complex, the absence of convergence guarantees can lead to oscillating or suboptimal solutions. 2. Inadequate Quantification of Noise Propagation in Causal Inference: W

Reviewer 03Rating 5Confidence 3

Strengths

- It is the first to leverage an LLM multi-agent system for causal discovery. - The paper proposes a multi-agent framework for causality, presenting three distinct models that demonstrate strong performance in experiments on real datasets. - It provides an extensive comparison of diverse LLM models and includes an informative ablation analysis to illustrate the necessity of each key component in the proposed method.

Weaknesses

- The paper’s methodological novelty may be somewhat limited, as its primary contribution appears to be the use of a multi-agent debating mechanism to improve the precision of causal reasoning, with much of the effort focused on constructing prompts tailored to causal discovery-specific problems. It would be valuable if the authors could further elaborate on any additional innovative aspects of the design. - Clarity could be improved throughout the paper. For example, i) the distinction between

Reviewer 04Rating 3Confidence 4

Strengths

The use of cooperation and competition among multiple agents enhances the accuracy and comprehensiveness of causal discovery. The framework combines the characteristics of different models for more efficient and flexible causal analysis. The paper demonstrates the performance of different models on various datasets, using metrics such as SHD and NHD for comparison. Exploration of LLM Potential in Causal Inference: This is the first in-depth study of LLMs’ multi-agent approaches in causal disc

Weaknesses

A hybrid model can balance statistical inference with the reasoning abilities of a language model, making it suitable for tasks with high complexity. However, the hybrid approach does not consistently outperform other methods significantly. While multi-agent systems can enhance model performance, their effectiveness largely depends on the complexity and structure of the dataset. For example, Coding Agents perform well on moderately complex datasets, but their performance may fall short for simp

Reviewer 05Rating 5Confidence 3

Strengths

Strengths: - The authors aim to establish a multi-LLM agent framework for addressing causal discovery problems, leveraging advanced statistical algorithms and the internal knowledge of LLMs to achieve more accurate discovery results, as LLMs alone may not produce optimal outcomes in causal discovery. - The authors conduct a full graph discovery experiment using the latest LLM models. - The Meta-Debate module is designed to enhance the reasoning capabilities of LLMs.

Weaknesses

Weaknesses: - There are some formatting errors, such as unresolved references like "Figure ?." - Given that the causal discovery results provided by LLMs alone sometimes lack stability [Altering the order of variables in the prompts can impact the results], presenting statistical significance results and trying more data source could enhance the validity of the findings. - The experiments might not fully capture the complexity of real-world datasets. The generalizability of the model to more div

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Data Mining Algorithms and Applications · Biomedical Text Mining and Ontologies