Leveraging Language Models and RAG for Efficient Knowledge Discovery in Clinical Environments

Seokhwan Ko; Donghyeon Lee; Jaewoo Chun; Hyungsoo Han; Junghwan Cho

arXiv:2601.04209·cs.CL·January 9, 2026

Leveraging Language Models and RAG for Efficient Knowledge Discovery in Clinical Environments

Seokhwan Ko, Donghyeon Lee, Jaewoo Chun, Hyungsoo Han, Junghwan Cho

PDF

Open Access

TL;DR

This paper presents a local RAG system combining PubMedBERT and LLaMA3 to facilitate biomedical knowledge discovery in clinical settings with strict privacy requirements.

Contribution

It introduces a novel local RAG framework using domain-specific encoders and lightweight LLMs for secure biomedical research collaboration recommendations.

Findings

01

Effective local deployment of RAG for clinical environments

02

Successful integration of PubMedBERT and LLaMA3 models

03

Potential to enhance biomedical knowledge discovery

Abstract

Large language models (LLMs) are increasingly recognized as valuable tools across the medical environment, supporting clinical, research, and administrative workflows. However, strict privacy and network security regulations in hospital settings require that sensitive data be processed within fully local infrastructures. Within this context, we developed and evaluated a retrieval-augmented generation (RAG) system designed to recommend research collaborators based on PubMed publications authored by members of a medical institution. The system utilizes PubMedBERT for domain-specific embedding generation and a locally deployed LLaMA3 model for generative synthesis. This study demonstrates the feasibility and utility of integrating domain-specialized encoders with lightweight LLMs to support biomedical knowledge discovery under local deployment constraints.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling