An Efficient and Effective Evaluator for Text2SQL Models on Unseen and Unlabeled Data
Trinh Pham, Thanh Tam Nguyen, Viet Huynh, Hongzhi Yin, Quoc Viet Hung Nguyen

TL;DR
FusionSQL is a novel evaluation method that estimates Text2SQL system accuracy on unseen, unlabeled data by analyzing output patterns, enabling early detection of quality issues without requiring labeled datasets.
Contribution
It introduces a model-agnostic approach to assess Text2SQL performance on unlabeled data, addressing a key deployment challenge.
Findings
FusionSQL accurately tracks actual system accuracy.
It reliably detects quality decline in real-world scenarios.
Supports continuous monitoring and pre-release checks.
Abstract
Recent advances in large language models has strengthened Text2SQL systems that translate natural language questions into database queries. A persistent deployment challenge is to assess a newly trained Text2SQL system on an unseen and unlabeled dataset when no verified answers are available. This situation arises frequently because database content and structure evolve, privacy policies slow manual review, and carefully written SQL labels are costly and time-consuming. Without timely evaluation, organizations cannot approve releases or detect failures early. FusionSQL addresses this gap by working with any Text2SQL models and estimating accuracy without reference labels, allowing teams to measure quality on unseen and unlabeled datasets. It analyzes patterns in the system's own outputs to characterize how the target dataset differs from the material used during training. FusionSQL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mobile Crowdsensing and Crowdsourcing · Natural Language Processing Techniques
