SCARE: A Benchmark for SQL Correction and Question Answerability Classification for Reliable EHR Question Answering

Gyubok Lee; Woosog Chay; Edward Choi

arXiv:2511.17559·cs.CL·December 23, 2025

SCARE: A Benchmark for SQL Correction and Question Answerability Classification for Reliable EHR Question Answering

Gyubok Lee, Woosog Chay, Edward Choi

PDF

Open Access

TL;DR

SCARE is a comprehensive benchmark designed to evaluate post-hoc verification methods for SQL queries and question answerability in EHR question answering systems, addressing safety concerns in clinical AI deployment.

Contribution

The paper introduces SCARE, the first unified benchmark for assessing safety verification mechanisms in EHR QA systems, including question classification and SQL correction.

Findings

01

Identified a trade-off between question classification accuracy and SQL error correction.

02

Benchmark includes 4,200 question-SQL-output triples from multiple models and datasets.

03

Experimental results highlight key challenges and future research directions.

Abstract

Recent advances in Large Language Models (LLMs) have enabled the development of text-to-SQL models that allow clinicians to query structured data stored in Electronic Health Records (EHRs) using natural language. However, deploying these models for EHR question answering (QA) systems in safety-critical clinical environments remains challenging: incorrect SQL queries-whether caused by model errors or problematic user inputs-can undermine clinical decision-making and jeopardize patient care. While prior work has mainly focused on improving SQL generation accuracy or filtering questions before execution, there is a lack of a unified benchmark for evaluating independent post-hoc verification mechanisms (i.e., a component that inspects and validates the generated SQL before execution), which is crucial for safe deployment. To fill this gap, we introduce SCARE, a benchmark for evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Electronic Health Records Systems · Machine Learning in Healthcare