GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
Peter Fernandes, Ria Kanjilal

TL;DR
This paper systematically evaluates GraphRAG for healthcare EHR schema retrieval using open-source LLMs on consumer hardware, revealing model size thresholds and retrieval design impacts on performance and reliability.
Contribution
It provides the first comprehensive benchmarking of GraphRAG with local LLMs on real-world healthcare data, highlighting practical deployment considerations.
Findings
Llama 3.1 produces the richest knowledge graph with 1,172 entities.
Qwen 2.5 achieves the best answer quality score of 3.3/5.
Models below approximately 7B parameters struggle with structured output and pipeline completion.
Abstract
Graph-based Retrieval Augmented Generation (GraphRAG) extends retrieval-augmented generation to support structured reasoning over complex corpora, but its reliability under resource-constrained, privacy-sensitive deployments remains unclear. In healthcare, where Electronic Health Record (EHR) data is complex and strictly regulated, reliance on cloud-based large language models (LLMs) introduces challenges in cost, latency, and compliance. In this work, we present a systematic evaluation of GraphRAG for EHR schema retrieval using locally deployed open-source LLMs. We implement the Microsoft GraphRAG pipeline on real-world EHR schema documentation and benchmark four models, including Llama 3.1 (8B), Mistral (7B), Qwen 2.5 (7B), and Phi-4-mini (3.8B), each deployed via Ollama on a single consumer GPU (8 GB VRAM). We evaluate indexing efficiency, knowledge graph construction, query latency,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
