Retrieval augmented text-to-SQL generation for epidemiological question answering using electronic health records
Angelo Ziletti, Leonardo D'Ambrosi

TL;DR
This paper presents a retrieval-augmented text-to-SQL approach tailored for answering epidemiological questions from electronic health records, enhancing performance by integrating medical coding into the generation process.
Contribution
It introduces a novel end-to-end method combining retrieval-augmented generation with text-to-SQL, specifically adapted for complex medical data and terminology.
Findings
Significant performance improvement over simple prompting methods
Medical coding integration enhances SQL query accuracy
RAG shows promise despite current language model limitations
Abstract
Electronic health records (EHR) and claims data are rich sources of real-world data that reflect patient health status and healthcare utilization. Querying these databases to answer epidemiological questions is challenging due to the intricacy of medical terminology and the need for complex SQL queries. Here, we introduce an end-to-end methodology that combines text-to-SQL generation with retrieval augmented generation (RAG) to answer epidemiological questions using EHR and claims data. We show that our approach, which integrates a medical coding step into the text-to-SQL process, significantly improves the performance over simple prompting. Our findings indicate that although current language models are not yet sufficiently accurate for unsupervised use, RAG offers a promising direction for improving their capabilities, as shown in a realistic industry setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Attention Dropout · Residual Connection · Weight Decay · WordPiece · BERT · Linear Layer · Dense Connections
