SCARE: A Benchmark for SQL Correction and Question Answerability Classification for Reliable EHR Question Answering
Gyubok Lee, Woosog Chay, Edward Choi

TL;DR
SCARE is a comprehensive benchmark designed to evaluate post-hoc verification methods for SQL queries and question answerability in EHR question answering systems, addressing safety concerns in clinical AI deployment.
Contribution
The paper introduces SCARE, the first unified benchmark for assessing safety verification mechanisms in EHR QA systems, including question classification and SQL correction.
Findings
Identified a trade-off between question classification accuracy and SQL error correction.
Benchmark includes 4,200 question-SQL-output triples from multiple models and datasets.
Experimental results highlight key challenges and future research directions.
Abstract
Recent advances in Large Language Models (LLMs) have enabled the development of text-to-SQL models that allow clinicians to query structured data stored in Electronic Health Records (EHRs) using natural language. However, deploying these models for EHR question answering (QA) systems in safety-critical clinical environments remains challenging: incorrect SQL queries-whether caused by model errors or problematic user inputs-can undermine clinical decision-making and jeopardize patient care. While prior work has mainly focused on improving SQL generation accuracy or filtering questions before execution, there is a lack of a unified benchmark for evaluating independent post-hoc verification mechanisms (i.e., a component that inspects and validates the generated SQL before execution), which is crucial for safe deployment. To fill this gap, we introduce SCARE, a benchmark for evaluating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Electronic Health Records Systems · Machine Learning in Healthcare
