Hybrid Querying Over Relational Databases and Large Language Models
Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi

TL;DR
This paper introduces SWAN, a benchmark for hybrid querying that combines relational databases with large language models, demonstrating promising results with GPT-4 Turbo in answering complex beyond-database questions.
Contribution
It presents the first cross-domain benchmark SWAN and explores two novel solutions for integrating LLMs with databases for complex querying.
Findings
Up to 40.0% execution accuracy with GPT-4 Turbo
48.2% data factuality achieved
Benchmark enables evaluation of hybrid querying approaches
Abstract
Database queries traditionally operate under the closed-world assumption, providing no answers to questions that require information beyond the data stored in the database. Hybrid querying using SQL offers an alternative by integrating relational databases with large language models (LLMs) to answer beyond-database questions. In this paper, we present the first cross-domain benchmark, SWAN, containing 120 beyond-database questions over four real-world databases. To leverage state-of-the-art language models in addressing these complex questions in SWAN, we present two solutions: one based on schema expansion and the other based on user defined functions. We also discuss optimization opportunities and potential future directions. Our evaluation demonstrates that using GPT-4 Turbo with few-shot prompts, one can achieves up to 40.0\% in execution accuracy and 48.2\% in data factuality.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Semantic Web and Ontologies
MethodsLinear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
