Boundary-Aware NL2SQL: Integrating Reliability through Hybrid Reward and Data Synthesis
Songsong Tian, Kongsheng Zhuo, Zhendong Wang, Rong Shen, Shengtao Zhang, Yong Wu

TL;DR
BAR-SQL is a novel NL2SQL framework that enhances reliability and boundary-awareness through data synthesis, interpretability techniques, and a hybrid reward training process, achieving high accuracy and abstention performance.
Contribution
It introduces a boundary-aware training framework with data synthesis, interpretability, and a hybrid reward mechanism, advancing the reliability of NL2SQL models.
Findings
Achieves 91.48% SQL accuracy on Ent-SQL-Bench
Outperforms proprietary models like Claude 4.5 Sonnet and GPT-5
Demonstrates effective boundary-aware abstention in ambiguous queries
Abstract
In this paper, we present BAR-SQL (Boundary-Aware Reliable NL2SQL), a unified training framework that embeds reliability and boundary awareness directly into the generation process. We introduce a Seed Mutation data synthesis paradigm that constructs a representative enterprise corpus, explicitly encompassing multi-step analytical queries alongside boundary cases including ambiguity and schema limitations. To ensure interpretability, we employ Knowledge-Grounded Reasoning Synthesis, which produces Chain-of-Thought traces explicitly anchored in schema metadata and business rules. The model is trained through a two-stage process: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning via Group Relative Policy Optimization. We design a Task-Conditioned Hybrid Reward mechanism that simultaneously optimizes SQL execution accuracy-leveraging Abstract Syntax Tree analysis and dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Advanced Database Systems and Queries
