Automating Pharmacovigilance Evidence Generation: Using Large Language Models to Produce Context-Aware SQL
Jeffery L. Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, and Andrew Bate

TL;DR
This paper demonstrates that using a business context document with GPT-4 significantly improves the accuracy of converting natural language queries into SQL for pharmacovigilance databases, enhancing data retrieval efficiency.
Contribution
The study introduces a novel retrieval-augmented generation framework that leverages contextual knowledge to improve LLM-based NLQ-to-SQL conversion accuracy in pharmacovigilance.
Findings
NLQ-to-SQL accuracy increased from 8.3% to 78.3% with context.
Performance remained high across different query complexities.
Contextual knowledge is critical for accurate SQL generation.
Abstract
Objective: To enhance the efficiency and accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document. Materials and Methods: We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into syntactically precise SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in three phases, varying query complexity, and assessing the LLM's performance both with and without the business context document. Results: Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3\% with the database schema alone to 78.3\% with the business context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
