ETM: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models
Benjamin G. Ascoli, Yasoda Sai Ram Kandikonda, Jinho D. Choi

TL;DR
This paper introduces ETM, a new evaluation metric for Text-to-SQL tasks that addresses limitations of existing metrics by combining syntactic and semantic analysis, leading to more accurate performance assessment especially for large language models.
Contribution
The paper presents ETM, a novel evaluation metric for Text-to-SQL that reduces false positives and negatives compared to traditional metrics, and provides an open-source implementation.
Findings
ETM significantly lowers false positive and negative rates compared to EXE and ESM.
Evaluation of nine LLM-based models shows ETM's robustness over traditional metrics.
Open-source ETM script enhances evaluation reliability in Text-to-SQL research.
Abstract
The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. While this task has made substantial progress, the two primary evaluation metrics - Execution Accuracy (EXE) and Exact Set Matching Accuracy (ESM) - suffer from inherent limitations that can misrepresent performance. Specifically, ESM's rigid matching overlooks semantically correct but stylistically different queries, whereas EXE can overestimate correctness by ignoring structural errors that yield correct outputs. These shortcomings become especially problematic when assessing outputs from large language model (LLM)-based approaches without fine-tuning, which vary more in style and structure compared to their fine-tuned counterparts. Thus, we introduce a new metric, Enhanced Tree Matching (ETM), which mitigates these issues by comparing queries using both syntactic and semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSparse Evolutionary Training
