SEED: Enhancing Text-to-SQL Performance and Practical Usability Through Automatic Evidence Generation

Janghyeon Yun; Sang-goo Lee

arXiv:2506.07423·cs.CL·June 10, 2025

SEED: Enhancing Text-to-SQL Performance and Practical Usability Through Automatic Evidence Generation

Janghyeon Yun, Sang-goo Lee

PDF

1 Repo

TL;DR

SEED automatically generates evidence from database schemas to enhance text-to-SQL performance and usability, reducing reliance on human-provided evidence and improving model robustness in real-world applications.

Contribution

The paper introduces SEED, a novel automatic evidence generation method that improves text-to-SQL accuracy and practicality without requiring human-annotated evidence.

Findings

01

SEED significantly improves SQL accuracy in no-evidence scenarios.

02

SEED can outperform models with human-provided evidence in some cases.

03

Enhances model robustness and adaptability for real-world deployment.

Abstract

Text-to-SQL enables non-experts to retrieve data from databases by converting natural language queries into SQL. However, state-of-the-art text-to-SQL studies rely on the BIRD dataset, which assumes that evidence is provided along with questions. Although BIRD facilitates research advancements, it assumes that users have expertise and domain knowledge, contradicting the fundamental goal of text-to-SQL. In addition, human-generated evidence in BIRD contains defects, including missing or erroneous evidence, which affects model performance. To address this issue, we propose SEED (System for Evidence Extraction and Domain knowledge generation), an approach that automatically generates evidence to improve performance and practical usability in real-world scenarios. SEED systematically analyzes database schema, description files, and values to extract relevant information. We evaluated SEED…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

felix01189/seed
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.