SEMA-SQL: Beyond Traditional Relational Querying with Large Language Models
Yin Lin, Tianjing Zeng, Zhongjun Ding, Rong Zhu, Bolin Ding, H. V. Jagadish, and Jingren Zhou

TL;DR
SEMA-SQL integrates traditional relational querying with large language model reasoning to handle complex, real-world data questions beyond standard SQL, automating query generation, optimization, and execution.
Contribution
It introduces Hybrid Relational Algebra (HRA) and automates query generation, optimization, and execution to enhance semantic reasoning in database queries using LLMs.
Findings
Reduces LLM calls by 93% in semantic joins
Improves query capabilities beyond standard SQL
Demonstrates significant performance and capability gains on benchmarks
Abstract
Relational databases excel at structured data analysis, but real-world queries increasingly require capabilities beyond standard SQL, such as semantically matching entities across inconsistent names, extracting information not explicitly stored in schemas, and analyzing unstructured text. While text-to-SQL systems enable natural language querying, they remain limited to relational operations and cannot leverage the semantic reasoning capabilities of modern large language models (LLMs). Conversely, recent semantic operator systems extend relational algebra with LLM-powered operations (e.g., semantic joins, mappings, aggregations), but require users to manually construct complex query pipelines. To address this gap, we present SEMA-SQL, a system that automatically answers natural language questions by generating efficient queries that combine relational operations with LLM semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
