SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

Andrew Tremante; Yang He; Rocky Klopfenstein; Yuepeng Wang; Nina Narodytska; Haoze Wu

arXiv:2603.04334·cs.DB·May 13, 2026

SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

Andrew Tremante, Yang He, Rocky Klopfenstein, Yuepeng Wang, Nina Narodytska, Haoze Wu

PDF

TL;DR

SpotIt+ is an open-source verification tool that evaluates Text-to-SQL systems by actively finding database instances that reveal differences between generated and ground truth queries, using constraints mined with LLM validation.

Contribution

It introduces a constraint-mining pipeline combining rule-based and LLM validation to generate realistic test databases for more effective Text-to-SQL evaluation.

Findings

01

SpotIt+ uncovers discrepancies missed by standard evaluation methods.

02

Mined constraints lead to more realistic differentiating databases.

03

The approach improves evaluation accuracy on the BIRD dataset.

Abstract

We present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the ground truth, SpotIt+ actively searches for database instances that differentiate the two queries. To ensure that the generated counterexamples reflect practically relevant discrepancies, we introduce a best-effort constraint-mining pipeline that combines rule-based specification mining with LLM-based validation over example databases. Experimental results on the BIRD dataset show that the mined constraints enable SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.