Residual Skill Optimization for Text-to-SQL Ensembles

Jiongli Zhu; Haoquan Guan; Parjanya Prajakta Prashant; Nikki Lijing Kuang; Seyedeh Baharan Khatami; Canwen Xu; Xiaodong Yu; Yingyu Lin; Zhewei Yao; Yuxiong He; Babak Salimi

arXiv:2605.21792·cs.CL·May 22, 2026

Residual Skill Optimization for Text-to-SQL Ensembles

Jiongli Zhu, Haoquan Guan, Parjanya Prajakta Prashant, Nikki Lijing Kuang, Seyedeh Baharan Khatami, Canwen Xu, Xiaodong Yu, Yingyu Lin, Zhewei Yao, Yuxiong He, Babak Salimi

PDF

TL;DR

This paper introduces DivSkill-SQL, a residual skill optimization framework that enhances Text-to-SQL ensemble accuracy by building diverse, complementary skills without model fine-tuning, leading to significant improvements across multiple datasets and models.

Contribution

DivSkill-SQL is a novel framework that optimizes diverse SQL generation skills on failure cases, improving ensemble performance without retraining or fine-tuning.

Findings

01

Up to +11.1 points accuracy on Snowflake

02

Up to +8.3 points accuracy on BigQuery

03

Fewer hallucinated schema references and function calls

Abstract

Text-to-SQL ensembles improve over single-candidate generation by drawing multiple SQL candidates and selecting one, but their effectiveness is bounded by Pass@K, the probability that at least one of K candidates is correct. Existing methods source diversity heuristically through stochastic decoding or prompt variants, leaving candidate sets dominated by correlated failures. We present DivSkill-SQL, a residual skill optimization framework that builds complementary agentic Text-to-SQL ensembles without model fine-tuning: each new skill is optimized on examples the current skill ensemble fails on, provably targeting its marginal contribution to Pass@K. On Spider2-Lite, DivSkill-SQL improves selected accuracy by up to +11.1 points on Snowflake and +8.3 on BigQuery over the strongest ensemble baseline, with consistent gains across two base models (Opus-4.6 and GPT-5.4). Skills optimized on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.