# Integrating Fine-Tuning and Retrieval-Augmented Generation for Healthcare AI Systems: A Scoping Review

**Authors:** Bernardo G. Collaco, Prabha Srinivasagam, Cesar A. Gomez-Cabello, Syed Ali Haider, Ariana Genovese, Nadia G. Wood, Sanjay Bagaria, Mark A. Lifson, Antonio Jorge Forte

PMC · DOI: 10.3390/bioengineering13020225 · Bioengineering · 2026-02-14

## TL;DR

This review explores how combining fine-tuning and retrieval-augmented generation improves healthcare AI by enhancing accuracy and reducing errors.

## Contribution

The paper provides a scoping review of hybrid FT + RAG frameworks in healthcare AI, highlighting their benefits and implementation variations.

## Key findings

- FT + RAG systems outperformed FT-only or RAG-only approaches in QA and clinical tasks.
- Parameter-efficient FT methods like LoRA were commonly used, while RAG implementations showed diversity.
- Hybrids improved accuracy, reduced hallucinations, and were preferred by clinicians in secure settings.

## Abstract

(1) Background: Large language models (LLMs) show promise in healthcare but are constrained by hallucinations, static knowledge, and limited domain specificity. Fine-tuning (FT) and retrieval-augmented generation (RAG) offer complementary solutions, with FT embedding domain reasoning and RAG enabling dynamic, up-to-date knowledge access. Hybrid FT + RAG frameworks have been proposed to improve factual accuracy and clinical reliability. This scoping review synthesizes current evidence on such hybrids in healthcare AI. (2) Methods: The search across PubMed, IEEE Xplore, Google Scholar, and Embase identified studies implementing explicit FT + RAG hybrids in healthcare or biomedical tasks. Eligible studies reported empirical evaluations of LLM performance or behavior. Data were extracted on base models, FT strategies, RAG architectures, applications, and performance outcomes. (3) Results: Seven studies met inclusion criteria. FT + RAG systems consistently outperformed FT-only or RAG-only approaches across QA, clinical summarization, report generation, and decision support tasks. Parameter-efficient FT methods (e.g., LoRA) were common, while RAG implementations varied (dense, hybrid, hierarchical, multimodal, federated). Reported benefits included improved accuracy, reduced hallucination, and greater clinician preference and feasibility in protected settings. (4) Conclusions: FT + RAG frameworks represent a promising direction for clinically grounded healthcare AI, combining domain-specific reasoning with transparent, up-to-date retrieval. Future work should prioritize standardized evaluation, workflow integration, and governance to enable safe deployment.

## Full-text entities

- **Diseases:** FT (MESH:C566019), hallucination (MESH:D006212), LLM (MESH:D007806), injury to (MESH:D014947)
- **Chemicals:** BioRender (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12938813/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12938813/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC12938813/full.md

---
Source: https://tomesphere.com/paper/PMC12938813