ETM: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Benjamin G. Ascoli; Yasoda Sai Ram Kandikonda; Jinho D. Choi

arXiv:2407.07313·cs.CL·June 18, 2025·1 cites

ETM: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Benjamin G. Ascoli, Yasoda Sai Ram Kandikonda, Jinho D. Choi

PDF

Open Access 2 Repos

TL;DR

This paper introduces ETM, a new evaluation metric for Text-to-SQL tasks that addresses limitations of existing metrics by combining syntactic and semantic analysis, leading to more accurate performance assessment especially for large language models.

Contribution

The paper presents ETM, a novel evaluation metric for Text-to-SQL that reduces false positives and negatives compared to traditional metrics, and provides an open-source implementation.

Findings

01

ETM significantly lowers false positive and negative rates compared to EXE and ESM.

02

Evaluation of nine LLM-based models shows ETM's robustness over traditional metrics.

03

Open-source ETM script enhances evaluation reliability in Text-to-SQL research.

Abstract

The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. While this task has made substantial progress, the two primary evaluation metrics - Execution Accuracy (EXE) and Exact Set Matching Accuracy (ESM) - suffer from inherent limitations that can misrepresent performance. Specifically, ESM's rigid matching overlooks semantically correct but stylistically different queries, whereas EXE can overestimate correctness by ignoring structural errors that yield correct outputs. These shortcomings become especially problematic when assessing outputs from large language model (LLM)-based approaches without fine-tuning, which vary more in style and structure compared to their fine-tuned counterparts. Thus, we introduce a new metric, Enhanced Tree Matching (ETM), which mitigates these issues by comparing queries using both syntactic and semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsSparse Evolutionary Training