Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks
Konstantinos Vrettos, Michail E. Klontzas

TL;DR
This study develops a local Retrieval-Augmented Generation framework for medical tasks that outperforms commercial large language models in accuracy and energy efficiency, promoting sustainable AI in healthcare.
Contribution
The paper introduces a customizable RAG framework that leverages open-source LLMs for medical tasks, demonstrating superior performance and energy efficiency over commercial models.
Findings
Custom RAG models outperform commercial models in accuracy.
Llama3.1-RAG has the lowest energy consumption and CO2 footprint.
Llama3.1-RAG achieves 2.7x more accuracy per kWh than o4-mini.
Abstract
Background The increasing adoption of Artificial Intelligence (AI) in healthcare has sparked growing concerns about its environmental and ethical implications. Commercial Large Language Models (LLMs), such as ChatGPT and DeepSeek, require substantial resources, while the utilization of these systems for medical purposes raises critical issues regarding patient privacy and safety. Methods We developed a customizable Retrieval-Augmented Generation (RAG) framework for medical tasks, which monitors its energy usage and CO2 emissions. This system was then used to create RAGs based on various open-source LLMs. The tested models included both general purpose models like llama3.1:8b and medgemma-4b-it, which is medical-domain specific. The best RAGs performance and energy consumption was compared to DeepSeekV3-R1 and OpenAIs o4-mini model. A dataset of medical questions was used for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Byte Pair Encoding · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · BERT · BART
