PerCache: Predictive Hierarchical Cache for RAG Applications on Mobile Devices

Kaiwei Liu; Liekang Zeng; Lilin Xu; Bufang Yang; Zhenyu Yan

arXiv:2601.11553·cs.DC·January 21, 2026

PerCache: Predictive Hierarchical Cache for RAG Applications on Mobile Devices

Kaiwei Liu, Liekang Zeng, Lilin Xu, Bufang Yang, Zhenyu Yan

PDF

Open Access

TL;DR

PerCache is a hierarchical caching system designed for mobile RAG applications that predicts and reuses intermediate results to significantly reduce latency, adapting dynamically to system load changes.

Contribution

It introduces a novel hierarchical cache architecture with predictive query population and adaptive configuration for mobile RAG systems.

Findings

01

Achieves 34.4% latency reduction over baselines

02

Effective cache hit rate improvement through prediction

03

Maintains latency performance under dynamic system loads

Abstract

Retrieval-augmented generation (RAG) has been extensively used as a de facto paradigm in various large language model (LLM)-driven applications on mobile devices, such as mobile assistants leveraging personal emails or meeting records. However, due to the lengthy prompts and the resource constraints, mobile RAG systems exhibit significantly high response latency. On this issue, one promising approach is to reuse intermediate computational results across different queries to eliminate redundant computation. But most existing approaches, such as KV cache reuse and semantic cache reuse, are designed for cloud settings and perform poorly, overlooking the distinctive characteristics of mobile RAG. We propose PerCache, a novel hierarchical cache solution designed for reducing end-to-end latency of personalized RAG applications on mobile platforms. PerCache adopts a hierarchical architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Cloud Computing and Resource Management · Caching and Content Delivery