EdgeRAG: Online-Indexed RAG for Edge Devices

Korakit Seemakhupt; Sihang Liu; Samira Khan

arXiv:2412.21023·cs.LG·January 3, 2025

EdgeRAG: Online-Indexed RAG for Edge Devices

Korakit Seemakhupt, Sihang Liu, Samira Khan

PDF

Open Access

TL;DR

EdgeRAG introduces an efficient retrieval method for resource-limited edge devices by combining embedding pruning, on-demand generation, and adaptive caching to reduce latency without sacrificing quality.

Contribution

We propose EdgeRAG, a novel approach that enables effective retrieval augmented generation on edge devices through memory-efficient embedding management and adaptive caching strategies.

Findings

01

Significant latency reduction over baseline IVF index.

02

Maintains similar generation quality to existing methods.

03

All evaluated datasets fit into limited memory on edge devices.

Abstract

Deploying Retrieval Augmented Generation (RAG) on resource-constrained edge devices is challenging due to limited memory and processing power. In this work, we propose EdgeRAG which addresses the memory constraint by pruning embeddings within clusters and generating embeddings on-demand during retrieval. To avoid the latency of generating embeddings for large tail clusters, EdgeRAG pre-computes and stores embeddings for these clusters, while adaptively caching remaining embeddings to minimize redundant computations and further optimize latency. The result from BEIR suite shows that EdgeRAG offers significant latency reduction over the baseline IVF index, but with similar generation quality while allowing all of our evaluated datasets to fit into the memory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optical Sensing Technologies · Advanced Semiconductor Detectors and Materials · Infrared Target Detection Methodologies

MethodsPruning