CSQL: Mapping Documents into Causal Databases

Sridhar Mahadevan

arXiv:2601.08109·cs.DB·January 14, 2026

CSQL: Mapping Documents into Causal Databases

Sridhar Mahadevan

PDF

Open Access

TL;DR

CSQL is a system that automatically converts unstructured text documents into a structured causal database, enabling complex causal queries and analysis across large document collections in various domains.

Contribution

The paper introduces CSQL, a novel system that transforms unstructured documents into causal databases supporting causal analysis, surpassing previous approaches like RAG or knowledge graphs.

Findings

01

Successfully converts documents into causal databases.

02

Enables complex causal queries over large corpora.

03

Demonstrates application in economics and other fields.

Abstract

We describe a novel system, CSQL, which automatically converts a collection of unstructured text documents into an SQL-queryable causal database (CDB). A CDB differs from a traditional DB: it is designed to answer "why'' questions via causal interventions and structured causal queries. CSQL builds on our earlier system, DEMOCRITUS, which converts documents into thousands of local causal models derived from causal discourse. Unlike RAG-based systems or knowledge-graph based approaches, CSQL supports causal analysis over document collections rather than purely associative retrieval. For example, given an article on the origins of human bipedal walking, CSQL enables queries such as: "What are the strongest causal influences on bipedalism?'' or "Which variables act as causal hubs with the largest downstream influence?'' Beyond single-document case studies, we show that CSQL can also ingest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhilosophy and History of Science · Bayesian Modeling and Causal Inference · Logic, Reasoning, and Knowledge