DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation

Kunming Shao; Zhipeng Liao; Jiangnan Yu; Liang Zhao; Qiwei Li; Xijie Huang; Jingyu He; Fengshi Tian; Yi Zou; Xiaomeng Wang; Tim Kwang-Ting Cheng; Chi-Ying Tsui

arXiv:2510.25278·cs.AR·October 30, 2025

DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation

Kunming Shao, Zhipeng Liao, Jiangnan Yu, Liang Zhao, Qiwei Li, Xijie Huang, Jingyu He, Fengshi Tian, Yi Zou, Xiaomeng Wang, Tim Kwang-Ting Cheng, Chi-Ying Tsui

PDF

TL;DR

DIRC-RAG introduces a high-density, low-power digital in-ReRAM computation architecture to accelerate retrieval-augmented generation on edge devices, significantly reducing latency and energy consumption while maintaining accuracy.

Contribution

It presents a novel DIRC architecture combining high-density ReRAM with digital MAC operations, enabling efficient, robust, and low-power edge RAG acceleration.

Findings

01

Achieves 5.18Mb/mm2 memory density with 131 TOPS throughput.

02

Reduces retrieval latency to 5.6μs/query and energy to 0.956μJ/query.

03

Maintains retrieval precision with error optimization and detection circuits.

Abstract

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieval but faces challenges on edge devices due to high storage, energy, and latency demands. Computing-in-Memory (CIM) offers a promising solution by storing document embeddings in CIM macros and enabling in-situ parallel retrievals but is constrained by either low memory density or limited computational accuracy. To address these challenges, we present DIRCRAG, a novel edge RAG acceleration architecture leveraging Digital In-ReRAM Computation (DIRC). DIRC integrates a high-density multi-level ReRAM subarray with an SRAM cell, utilizing SRAM and differential sensing for robust ReRAM readout and digital multiply-accumulate (MAC) operations. By storing all document embeddings within the CIM macro, DIRC achieves ultra-low-power, single-cycle data loading, substantially reducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.