TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring
Gyubok Lee, Woosog Chay, Seonhee Cho, Edward Choi

TL;DR
TrustSQL introduces a benchmark and scoring system to evaluate and improve the reliability of text-to-SQL models, emphasizing correct query generation and abstention from infeasible questions to foster safer deployment.
Contribution
The paper presents TrustSQL, a comprehensive benchmark with a penalty-based scoring metric for assessing and enhancing the reliability of text-to-SQL models, including new evaluation approaches.
Findings
Existing methods struggle under severe penalties, indicating room for improvement.
Unified models can be evaluated effectively with the new penalty-based metric.
Achieving high reliability scores requires significant effort in model development.
Abstract
Text-to-SQL enables users to interact with databases using natural language, simplifying the retrieval and synthesis of information. Despite the remarkable success of large language models (LLMs) in translating natural language questions into SQL queries, widespread deployment remains limited due to two primary challenges. First, the effective use of text-to-SQL models depends on users' understanding of the model's capabilities-the scope of questions the model can correctly answer. Second, the absence of abstention mechanisms can lead to incorrect SQL generation going unnoticed, thereby undermining trust in the model's output. To enable wider deployment, it is crucial to address these challenges in model design and enhance model evaluation to build trust in the model's output. To this end, we introduce TrustSQL, a novel comprehensive benchmark designed to evaluate text-to-SQL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Data Security Solutions · Cloud Computing and Resource Management · Access Control and Trust
