Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL
Dayton G. Thorpe, Andrew J. Duberstein, Ian A. Kinsey

TL;DR
Dubo-SQL introduces novel retrieval-augmented generation and fine-tuning methods that significantly improve execution accuracy in text-to-SQL tasks while reducing costs and increasing speed, setting new benchmarks on the BIRD-SQL dataset.
Contribution
The paper presents two new methods, Dubo-SQL v1 and v2, that enhance text-to-SQL performance using low-cost fine-tuning, RAG, and optimized input/output formats, surpassing existing models.
Findings
Dubo-SQL v1 sets a new record on BIRD-SQL EX.
Dubo-SQL v1 outperforms models using more expensive GPT-4.
Dubo-SQL v2 achieves higher performance with GPT-4 Turbo and RAG.
Abstract
The current state-of-the-art (SOTA) for automated text-to-SQL still falls well short of expert human performance as measured by execution accuracy (EX) on the BIRD-SQL benchmark. The most accurate methods are also slow and expensive. To advance the SOTA for text-to-SQL while reducing cost and improving speed, we explore the combination of low-cost fine tuning, novel methods for diverse retrieval-augmented generation (RAG) and new input and output formats that help large language models (LLMs) achieve higher EX. We introduce two new methods, Dubo-SQL v1 and v2. Dubo-SQL v1 sets a new record for EX on the holdout test set of BIRD-SQL. Dubo-SQL v2 achieves even higher performance on the BIRD-SQL dev set. Dubo-SQL v1 relies on LLMs from OpenAI, but uses the low-cost GPT-3.5 Turbo while exceeding the performance of the next-best model using OpenAI, which instead uses the more expensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Scientific Computing and Data Management · Advanced Computational Techniques and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Sparse Evolutionary Training · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dense Connections · Label Smoothing
