DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models
Fan Zhou, Siqiao Xue, Danrui Qi, Wenhui Shi, Wang Zhao, Ganglin Wei,, Hongyang Zhang, Caigai Jiang, Gangwei Jiang, Zhixuan Chu, Faqiang Chen

TL;DR
DB-GPT-Hub introduces an open benchmark suite for fine-tuning large language models on text-to-SQL tasks, enabling systematic evaluation and comparison of tuning versus prompting methods in this domain.
Contribution
It provides a standardized, extensible benchmark for fine-tuning LLMs on text-to-SQL, addressing the high computational cost barrier and facilitating research in this area.
Findings
Fine-tuning LLMs can outperform prompting approaches in text-to-SQL.
Benchmark reveals performance boundaries of tuning methods.
Open-source code supports easy extension and experimentation.
Abstract
Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper, we present DB-GPT-Hub, an open benchmark suite for LLM-empowered text-to-SQL, which primarily focuses on tuning LLMs at large scales. The proposed benchmark consists of: 1. a standardized and comprehensive evaluation of text-to-SQL tasks by fine-tuning medium to large-sized open LLMs; 2. a modularized and easy-to-extend codebase with mainstream LLMs and experimental scenarios supported, which prioritizes fine-tuning methods but can be easily extended to prompt-based setting. Our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Mathematics, Computing, and Information Processing
