Automatic database description generation for Text-to-SQL

Yingqi Gao; Zhiling Luo

arXiv:2502.20657·cs.AI·March 3, 2025

Automatic database description generation for Text-to-SQL

Yingqi Gao, Zhiling Luo

PDF

1 Repo

TL;DR

This paper introduces an automatic method for generating database descriptions to improve Text-to-SQL tasks, employing a dual-process approach that enhances understanding of database schemas and boosts SQL accuracy.

Contribution

It presents a novel dual-process method for automatic database description generation, improving schema understanding and SQL performance in Text-to-SQL tasks.

Findings

01

Improves SQL accuracy by 0.93% with generated descriptions.

02

Achieves 37% of human-level performance.

03

Method is validated on the Bird benchmark.

Abstract

In the context of the Text-to-SQL task, table and column descriptions are crucial for bridging the gap between natural language and database schema. This report proposes a method for automatically generating effective database descriptions when explicit descriptions are unavailable. The proposed method employs a dual-process approach: a coarse-to-fine process, followed by a fine-to-coarse process. The coarse-to-fine approach leverages the inherent knowledge of LLM to guide the understanding process from databases to tables and finally to columns. This approach provides a holistic understanding of the database structure and ensures contextual alignment. Conversely, the fine-to-coarse approach starts at the column level, offering a more accurate and nuanced understanding when stepping back to the table level. Experimental results on the Bird benchmark indicate that using descriptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xgenerationlab/xiyan-dbdescgen
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.