Question Answering on Patient Medical Records with Private Fine-Tuned LLMs
Sara Kothari, Ayush Gupta

TL;DR
This paper presents a privacy-preserving approach for semantic question answering over electronic health records using fine-tuned LLMs, demonstrating superior performance over larger models in specific medical QA tasks.
Contribution
It introduces a novel method combining resource identification and question answering on EHRs with private, fine-tuned LLMs, outperforming benchmark models like GPT-4 in accuracy.
Findings
Fine-tuned LLMs outperform GPT-4 by 0.55% in F1 score.
Smaller models achieve 42% better Meteor scores.
Sequential fine-tuning and self-evaluation improve model performance.
Abstract
Healthcare systems continuously generate vast amounts of electronic health records (EHRs), commonly stored in the Fast Healthcare Interoperability Resources (FHIR) standard. Despite the wealth of information in these records, their complexity and volume make it difficult for users to retrieve and interpret crucial health insights. Recent advances in Large Language Models (LLMs) offer a solution, enabling semantic question answering (QA) over medical data, allowing users to interact with their health records more effectively. However, ensuring privacy and compliance requires edge and private deployments of LLMs. This paper proposes a novel approach to semantic QA over EHRs by first identifying the most relevant FHIR resources for a user query (Task1) and subsequently answering the query based on these resources (Task2). We explore the performance of privately hosted, fine-tuned LLMs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗genloop/FHIR_QnA_Relevance_Classification_Llama-3.1-8B-BASE-FT_T1model
- 🤗genloop/FHIR_QnA_Relevance_Classification_Mistral-NeMo-BASE-FT_T1model· ♡ 1♡ 1
- 🤗genloop/FHIR_QnA_Relevance_Classification_Mistral-NeMo-Instruct-FT_T1model
- 🤗genloop/FHIR_QnA_Relevance_Classification_Llama-3.1-8b-Base-FT_T2_T1model
- 🤗genloop/FHIR_QnA_Relevance_Classification_Mistral-Nemo-Base-FT_T2_T1model
- 🤗genloop/FHIR_QnA_Llama-3.1-8b-500-Base-FT_T1_T2model
- 🤗genloop/FHIR_QnA_Llama-3.1-8b-4900-Base-FT_T1_T2model
- 🤗genloop/FHIR_QnA_Mistral-NeMo-Base-500-FT_T1_T2model
- 🤗genloop/FHIR_QnA_Mistral-NeMo-Base-4900-FT_T1_T2model
- 🤗genloop/FHIR_QnA_Mistral-NeMo-Base-4900-FT_T2model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsAttention Is All You Need · Adam · Softmax · Absolute Position Encodings · Residual Connection · Dropout · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
