End-To-End Causal Effect Estimation from Unstructured Natural Language   Data

Nikita Dhawan; Leonardo Cotta; Karen Ullrich; Rahul G. Krishnan; Chris; J. Maddison

arXiv:2407.07018·cs.LG·October 29, 2024·1 cites

End-To-End Causal Effect Estimation from Unstructured Natural Language Data

Nikita Dhawan, Leonardo Cotta, Karen Ullrich, Rahul G. Krishnan, Chris, J. Maddison

PDF

Open Access 1 Video

TL;DR

This paper introduces NATURAL, a novel method leveraging large language models to estimate causal effects directly from unstructured text data, reducing reliance on manual data curation and enabling cost-effective, automated causal inference.

Contribution

The paper presents NATURAL, a new family of causal estimators that operate on unstructured text using LLMs, automating data curation and imputation for causal effect estimation.

Findings

01

NATURAL estimates are within 3 percentage points of ground truth.

02

The method performs well on both synthetic and real-world datasets.

03

It achieves accurate causal effect estimates even in complex clinical trial data.

Abstract

Knowing the effect of an intervention is critical for human decision-making, but current approaches for causal effect estimation rely on manual data collection and structuring, regardless of the causal assumptions. This increases both the cost and time-to-completion for studies. We show how large, diverse observational text data can be mined with large language models (LLMs) to produce inexpensive causal effect estimates under appropriate causal assumptions. We introduce NATURAL, a novel family of causal effect estimators built with LLMs that operate over datasets of unstructured text. Our estimators use LLM conditional distributions (over variables of interest, given the text data) to assist in the computation of classical estimators of causal effect. We overcome a number of technical challenges to realize this idea, such as automating data curation and using LLMs to impute missing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

End-To-End Causal Effect Estimation from Unstructured Natural Language Data· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Bayesian Modeling and Causal Inference