NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

Shizheng Hou; Wenqi Pei; Nuo Chen; Quang-Trung Ta; Peng Lu; Beng Chin Ooi

arXiv:2604.16493·cs.DB·April 21, 2026

NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

Shizheng Hou, Wenqi Pei, Nuo Chen, Quang-Trung Ta, Peng Lu, Beng Chin Ooi

PDF

TL;DR

NL2SQLBench is a modular benchmarking framework that systematically evaluates LLM-enabled NL2SQL systems across core modules, revealing significant gaps and guiding future improvements.

Contribution

It introduces a comprehensive, modular evaluation framework for NL2SQL systems, including novel metrics and multi-agent benchmarking across diverse approaches and datasets.

Findings

01

Existing NL2SQL methods show substantial accuracy gaps.

02

Current approaches are computationally inefficient.

03

Benchmark datasets and evaluation rules have critical shortcomings.

Abstract

Natural Language to SQL (NL2SQL) technology empowers non-expert users to query relational databases without requiring SQL expertise. While large language models (LLMs) have greatly improved NL2SQL algorithms, their rapid development outpaces systematic evaluation, leaving a critical gap in understanding their effectiveness, efficiency, and limitations. To this end, we present NL2SQLBench, the first modular evaluation and benchmarking framework for LLM-enabled NL2SQL approaches. Specifically, we dissect NL2SQL systems into three core modules: Schema Selection, Candidate Generation, and Query Revision. For each module, we comprehensively review existing strategies and propose novel fine-grained metrics that systematically quantify module-level effectiveness and efficiency. We further implement these metrics in a flexible multi-agent framework, allowing configurable benchmarking across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.