DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline

Houman Kazemzadeh; Kiarash Mokhtari Dizaji; Seyed Reza Tavakoli; Farbod Davoodi; MohammadReza KarimiNejad; Parham Abed Azad; Fatemeh Latifi; Ali Sabzi; Armin Khosravi; Siavash Ahmadi; Babak Khalaj; Mohammad Hossein Rohban; Glolamali Aminian; Zohreh Amoozgar; Tahereh Javaheri

arXiv:2512.14896·cs.CL·May 21, 2026

DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline

Houman Kazemzadeh, Kiarash Mokhtari Dizaji, Seyed Reza Tavakoli, Farbod Davoodi, MohammadReza KarimiNejad, Parham Abed Azad, Fatemeh Latifi, Ali Sabzi, Armin Khosravi, Siavash Ahmadi, Babak Khalaj, Mohammad Hossein Rohban, Glolamali Aminian, Zohreh Amoozgar, Tahereh Javaheri

PDF

TL;DR

This paper introduces DrugRAG, a retrieval-augmented generation pipeline that enhances pharmacy question-answering accuracy of large language models by integrating external drug knowledge without altering the models.

Contribution

The study presents DrugRAG, a novel external knowledge retrieval method that significantly improves LLM performance on pharmacy tasks without changing model architecture.

Findings

01

DrugRAG increased accuracy by 7 to 21 percentage points across models.

02

Statistically significant improvements were observed mainly in smaller and mid-sized open-source models.

03

Benchmarking showed baseline performance ranged from 46% to 92%, with DrugRAG boosting accuracy.

Abstract

In our study, we evaluated large language model (LLM) performance on pharmacy licensure-style question-answering tasks and developed an external knowledge integration method to improve accuracy. We benchmarked ten LLMs with varying parameter sizes (8 billion to 70+ billion) using a 141-question pharmacy dataset, measuring baseline accuracy without modification. Baseline performance ranged from 46% to 92%, with GPT-5 (92%) and o3 (89%) achieving the highest scores, while smaller open-source models showed substantially lower performance. We then developed DrugRAG, a three-step retrieval-augmented generation (RAG) pipeline that retrieves structured, evidence-based drug information and augments model prompts with contextual pharmacological evidence, operating externally and requiring no changes to model architecture or parameters. DrugRAG increased accuracy across all five evaluated models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare