ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling
Sangryul Kim, Donghee Han, Sehyun Kim

TL;DR
This paper presents a method to improve medical text-to-SQL query accuracy by filtering unanswerable queries using entropy measures and executing queries on the database to reduce errors.
Contribution
It introduces an entropy-based filtering approach and practical execution-based error mitigation for medical text-to-SQL tasks, enhancing answer reliability.
Findings
Effective filtering of unanswerable questions
Improved SQL query accuracy in medical domain
Method applicable without access to model parameters
Abstract
Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce an entropy-based method to identify and filter out unanswerable results. We further enhance result quality by filtering low-confidence SQL through log probability-based distribution, while grammatical and schema errors are mitigated by executing queries on the actual database. We experimentally verified that our method can filter unanswerable questions, which can be widely utilized even when the parameters of the model are not accessible, and that it can be effectively utilized in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Database Systems and Queries · Scientific Computing and Data Management · Data Quality and Management
