Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks

Konstantinos Vrettos; Michail E. Klontzas

arXiv:2506.20009·cs.AI·June 26, 2025

Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks

Konstantinos Vrettos, Michail E. Klontzas

PDF

Open Access

TL;DR

This study develops a local Retrieval-Augmented Generation framework for medical tasks that outperforms commercial large language models in accuracy and energy efficiency, promoting sustainable AI in healthcare.

Contribution

The paper introduces a customizable RAG framework that leverages open-source LLMs for medical tasks, demonstrating superior performance and energy efficiency over commercial models.

Findings

01

Custom RAG models outperform commercial models in accuracy.

02

Llama3.1-RAG has the lowest energy consumption and CO2 footprint.

03

Llama3.1-RAG achieves 2.7x more accuracy per kWh than o4-mini.

Abstract

Background The increasing adoption of Artificial Intelligence (AI) in healthcare has sparked growing concerns about its environmental and ethical implications. Commercial Large Language Models (LLMs), such as ChatGPT and DeepSeek, require substantial resources, while the utilization of these systems for medical purposes raises critical issues regarding patient privacy and safety. Methods We developed a customizable Retrieval-Augmented Generation (RAG) framework for medical tasks, which monitors its energy usage and CO2 emissions. This system was then used to create RAGs based on various open-source LLMs. The tested models included both general purpose models like llama3.1:8b and medgemma-4b-it, which is medical-domain specific. The best RAGs performance and energy consumption was compared to DeepSeekV3-R1 and OpenAIs o4-mini model. A dataset of medical questions was used for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Byte Pair Encoding · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · BERT · BART