FD-NL2SQL: Feedback-Driven Clinical NL2SQL that Improves with Use

Suparno Roy Chowdhury; Tejas Anvekar; Manan Roy Choudhury; Muhammad Ali Khan; Kaneez Zahra Rubab Khakwani; Mohamad Bassam Sonbol; Irbaz Bin Riaz; Vivek Gupta

arXiv:2604.15646·cs.CL·April 20, 2026

FD-NL2SQL: Feedback-Driven Clinical NL2SQL that Improves with Use

Suparno Roy Chowdhury, Tejas Anvekar, Manan Roy Choudhury, Muhammad Ali Khan, Kaneez Zahra Rubab Khakwani, Mohamad Bassam Sonbol, Irbaz Bin Riaz, Vivek Gupta

PDF

TL;DR

FD-NL2SQL is an interactive, feedback-driven system that translates natural language questions into SQL queries for oncology databases, improving accuracy through user feedback and exemplar augmentation.

Contribution

It introduces a schema-aware LLM approach with feedback mechanisms that enhance NL2SQL performance over time in clinical oncology contexts.

Findings

01

Incorporates clinician feedback to expand exemplar bank.

02

Uses schema-aware decomposition for accurate SQL generation.

03

Supports continuous improvement through automatic exemplar expansion.

Abstract

Clinicians exploring oncology trial repositories often need ad-hoc, multi-constraint queries over biomarkers, endpoints, interventions, and time, yet writing SQL requires schema expertise. We demo FD-NL2SQL, a feedback-driven clinical NL2SQL assistant for SQLite-based oncology databases. Given a natural-language question, a schema-aware LLM decomposes it into predicate-level sub-questions, retrieves semantically similar expert-verified NL2SQL exemplars via sentence embeddings, and synthesizes executable SQL conditioned on the decomposition, retrieved exemplars, and schema, with post-processing validity checks. To improve with use, FD-NL2SQL incorporates two update signals: (i) clinician edits of generated SQL are approved and added to the exemplar bank; and (ii) lightweight logic-based SQL augmentation applies a single atomic mutation (e.g., operator or column change), retaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.