Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries

Josefa Lia Stoisser; Marc Boubnovski Martell; Kaspar M\"artens; Lawrence Phillips; Stephen Michael Town; Rory Donovan-Maiye; Julien Fauqueur

arXiv:2505.21801·cs.DB·September 23, 2025

Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries

Josefa Lia Stoisser, Marc Boubnovski Martell, Kaspar M\"artens, Lawrence Phillips, Stephen Michael Town, Rory Donovan-Maiye, Julien Fauqueur

PDF

Open Access

TL;DR

This paper presents QDT, a privacy-preserving method for tabular prediction from EHR data that uses LLM-generated SQL queries to infer from aggregate statistics without accessing individual records.

Contribution

QDT introduces a novel inference-time-only approach that leverages LLMs to generate SQL queries for privacy-compliant, interpretable, and effective prediction from structured health data.

Findings

01

Achieved F1=0.70 on hospital readmission prediction, outperforming TabPFN.

02

Enables prediction without supervised training or access to individual data.

03

Provides interpretable, schema-aware, aggregate-based inference.

Abstract

Electronic health records (EHRs) contain richly structured, longitudinal data essential for predictive modeling, yet stringent privacy regulations (e.g., HIPAA, GDPR) often restrict access to individual-level records. We introduce \textbf{Query, Don't Train} (QDT): a \textbf{structured-data foundation-model interface} enabling \textbf{tabular inference} via LLM-generated SQL over EHRs. Instead of training on or accessing individual-level examples, QDT uses a large language model (LLM) as a schema-aware query planner to generate privacy-compliant SQL queries from a natural language task description and a test-time input. The model then extracts summary-level population statistics through these SQL queries, and the LLM performs chain-of-thought reasoning over the results to make predictions. This inference-time-only approach enables prediction without supervised model training, ensures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Privacy-Preserving Technologies in Data · Artificial Intelligence in Healthcare and Education