Text2SQL is Not Enough: Unifying AI and Databases with TAG

Asim Biswal; Liana Patel; Siddarth Jha; Amog Kamsetty; Shu Liu; Joseph; E. Gonzalez; Carlos Guestrin; Matei Zaharia

arXiv:2408.14717·cs.DB·August 28, 2024·3 cites

Text2SQL is Not Enough: Unifying AI and Databases with TAG

Asim Biswal, Liana Patel, Siddarth Jha, Amog Kamsetty, Shu Liu, Joseph, E. Gonzalez, Carlos Guestrin, Matei Zaharia

PDF

Open Access 1 Repo

TL;DR

This paper introduces TAG, a unified approach to answer natural language questions over databases by combining language models and data systems, revealing current methods' limitations and proposing new research directions.

Contribution

The paper proposes Table-Augmented Generation (TAG), a novel paradigm unifying AI and databases, and develops benchmarks to evaluate its effectiveness.

Findings

01

Standard methods answer less than 20% of queries correctly.

02

Existing approaches focus on limited question types, missing broader interactions.

03

The paper provides a new benchmark for future research.

Abstract

AI systems that serve natural language questions over databases promise to unlock tremendous value. Such systems would allow users to leverage the powerful reasoning and knowledge capabilities of language models (LMs) alongside the scalable computational power of data management systems. These combined capabilities would empower users to ask arbitrary natural language questions over custom data sources. However, existing methods and benchmarks insufficiently explore this setting. Text2SQL methods focus solely on natural language questions that can be expressed in relational algebra, representing a small subset of the questions real users wish to ask. Likewise, Retrieval-Augmented Generation (RAG) considers the limited subset of queries that can be answered with point lookups to one or a few data records within the database. We propose Table-Augmented Generation (TAG), a unified and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tag-research/tag-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsFocus