Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries
Josefa Lia Stoisser, Marc Boubnovski Martell, Kaspar M\"artens, Lawrence Phillips, Stephen Michael Town, Rory Donovan-Maiye, Julien Fauqueur

TL;DR
This paper presents QDT, a privacy-preserving method for tabular prediction from EHR data that uses LLM-generated SQL queries to infer from aggregate statistics without accessing individual records.
Contribution
QDT introduces a novel inference-time-only approach that leverages LLMs to generate SQL queries for privacy-compliant, interpretable, and effective prediction from structured health data.
Findings
Achieved F1=0.70 on hospital readmission prediction, outperforming TabPFN.
Enables prediction without supervised training or access to individual data.
Provides interpretable, schema-aware, aggregate-based inference.
Abstract
Electronic health records (EHRs) contain richly structured, longitudinal data essential for predictive modeling, yet stringent privacy regulations (e.g., HIPAA, GDPR) often restrict access to individual-level records. We introduce \textbf{Query, Don't Train} (QDT): a \textbf{structured-data foundation-model interface} enabling \textbf{tabular inference} via LLM-generated SQL over EHRs. Instead of training on or accessing individual-level examples, QDT uses a large language model (LLM) as a schema-aware query planner to generate privacy-compliant SQL queries from a natural language task description and a test-time input. The model then extracts summary-level population statistics through these SQL queries, and the LLM performs chain-of-thought reasoning over the results to make predictions. This inference-time-only approach enables prediction without supervised model training, ensures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Privacy-Preserving Technologies in Data · Artificial Intelligence in Healthcare and Education
