SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications
Jinyang Li, Xiaolong Li, Ge Qu, Per Jacobsson, Bowen Qin, Binyuan Hui, Shuzheng Si, Nan Huo, Xiaohan Xu, Yue Zhang, Ziwei Tang, Yuanshuai Li, Florensia Widjaja, Xintong Zhu, Feige Zhou, Yongfeng Huang, Yannis Papakonstantinou, Fatma Ozcan, Chenhao Ma, Reynold Cheng

TL;DR
This paper introduces BIRD-CRITIC, a comprehensive benchmark for evaluating LLMs on real-world SQL debugging, and presents Bird-Fixer, an open-source agent that significantly improves debugging success rates, advancing accessible SQL troubleshooting tools.
Contribution
The paper develops BIRD-CRITIC benchmark datasets and proposes innovative training strategies, including SQL-Rewind and f-Plan Boosting, to enhance open-source LLMs for SQL issue debugging.
Findings
Bird-Fixer surpasses proprietary models in success rate.
BIRD-CRITIC benchmark reveals current models' limitations.
SQL-Rewind and f-Plan Boosting improve debugging performance.
Abstract
Resolution of complex SQL issues persists as a significant bottleneck in real-world database applications. Current Large Language Models (LLMs), while adept at text-to-SQL translation, have not been rigorously evaluated on the more challenging task of debugging SQL issues. To address this gap, we introduce BIRD-CRITIC, a new SQL issue debugging benchmark comprising 530 PostgreSQL tasks (BIRD-CRITIC-PG) and 570 multi-dialect tasks (BIRD-CRITIC-Multi), distilled from authentic user issues and replayed within new environments to facilitate rigorous evaluation. Baseline evaluations underscore the task's complexity, with the leading reasoning model O3-Mini achieving only 38.87% success rate on BIRD-CRITIC-PG and 33.33% on BIRD-CRITIC-Multi. Meanwhile, advancing open-source models for database tasks is crucial for empowering local development while safeguarding data privacy. Therefore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer · GPT-4
