LLM-based Agents for Automated Confounder Discovery and Subgroup Analysis in Causal Inference

Po-Han Lee; Yu-Cheng Lin; Chan-Tung Ku; Chan Hsu; Pei-Cing Huang; Ping-Hsun Wu; Yihuang Kang

arXiv:2508.07221·cs.LG·August 12, 2025

LLM-based Agents for Automated Confounder Discovery and Subgroup Analysis in Causal Inference

Po-Han Lee, Yu-Cheng Lin, Chan-Tung Ku, Chan Hsu, Pei-Cing Huang, Ping-Hsun Wu, Yihuang Kang

PDF

Open Access

TL;DR

This paper introduces LLM-based agents that automate confounder discovery and subgroup analysis in causal inference, improving robustness and interpretability in real-world observational data analysis.

Contribution

The work presents a novel framework integrating LLM-based agents into causal ML pipelines for automated confounder detection and subgroup analysis, reducing human effort and enhancing interpretability.

Findings

01

Improved treatment effect estimation robustness in medical datasets.

02

Narrowed confidence intervals indicating increased estimate precision.

03

Uncovered previously unrecognized confounding biases.

Abstract

Estimating individualized treatment effects from observational data presents a persistent challenge due to unmeasured confounding and structural bias. Causal Machine Learning (causal ML) methods, such as causal trees and doubly robust estimators, provide tools for estimating conditional average treatment effects. These methods have limited effectiveness in complex real-world environments due to the presence of latent confounders or those described in unstructured formats. Moreover, reliance on domain experts for confounder identification and rule interpretation introduces high annotation cost and scalability concerns. In this work, we proposed Large Language Model-based agents for automated confounder discovery and subgroup analysis that integrate agents into the causal ML pipeline to simulate domain expertise. Our framework systematically performs subgroup identification and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Machine Learning in Healthcare · Bayesian Modeling and Causal Inference