PolySQL: Scaling Text-to-SQL Evaluation Across SQL Dialects via Automated Backend Isomorphism

Yotam Perlitz; Elad Venezian; Corentin Royer; Francesco Fusco; Andrea Giovannini

arXiv:2605.07796·cs.CL·May 11, 2026

PolySQL: Scaling Text-to-SQL Evaluation Across SQL Dialects via Automated Backend Isomorphism

Yotam Perlitz, Elad Venezian, Corentin Royer, Francesco Fusco, Andrea Giovannini

PDF

1 Repo

TL;DR

PolySQL introduces a novel method for evaluating text-to-SQL models across different SQL dialects without manual query translation, revealing significant performance gaps and dialect-specific challenges.

Contribution

It presents a dual-execution approach for cross-dialect evaluation, along with datasets and a framework to facilitate large-scale, accurate benchmarking of SQL dialect robustness.

Findings

01

SQLite performance does not reliably indicate other dialects' performance.

02

Cross-dialect evaluation shows a 10.1% accuracy drop from SQLite to others.

03

Most errors are logical rather than syntactic.

Abstract

SQL dialects vary in syntax, types, and functions across database engines. Text-to-SQL benchmarks, however, predominantly support only SQLite. This creates a critical evaluation gap: cross-dialect evaluation reveals weak per-query agreement (Cohen's ), showing that SQLite performance is an unreliable proxy for other dialects. Yet such evaluation remains prohibitively difficult: existing approaches either require expensive manual query transpilation or rely on tools that often fail on complex SQL. To close this gap, we introduce PolySQL, a novel dual-execution method that eliminates the need for query transpilation by comparing normalized execution results. Notably, our approach achieves higher evaluation fidelity than query transpilation with 100% query coverage. PolySQL comprises three datasets, enabling the first large-scale cross-dialect study. Our study reveals a 10.1% average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.