Confidence Estimation for Error Detection in Text-to-SQL Systems
Oleg Somov, Elena Tutubalina

TL;DR
This paper explores confidence estimation and selective classification techniques to improve error detection and calibration in Text-to-SQL systems, enhancing their robustness and interpretability.
Contribution
It introduces entropy-based selective classifiers and calibration methods to improve error detection and confidence alignment in Text-to-SQL models, with empirical evaluation across different architectures.
Findings
Encoder-decoder T5 is better calibrated than GPT-4 and Llama 3.
Selective classifiers effectively detect errors in irrelevant questions.
Calibration techniques improve model confidence and accuracy alignment.
Abstract
Text-to-SQL enables users to interact with databases through natural language, simplifying the retrieval and synthesis of information. Despite the success of large language models (LLMs) in converting natural language questions into SQL queries, their broader adoption is limited by two main challenges: achieving robust generalization across diverse queries and ensuring interpretative confidence in their predictions. To tackle these issues, our research investigates the integration of selective classifiers into Text-to-SQL systems. We analyse the trade-off between coverage and risk using entropy based confidence estimation with selective classifiers and assess its impact on the overall performance of Text-to-SQL models. Additionally, we explore the models' initial calibration and improve it with calibration techniques for better model alignment between confidence and accuracy. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsService-Oriented Architecture and Web Services
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Discriminative Fine-Tuning · Cosine Annealing · Adam · Dropout · SentencePiece · Softmax · Byte Pair Encoding · Linear Layer
