From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG
Changmin Lee, Jaemin Kim, Taesik Gong

TL;DR
This paper introduces EPIC, a memory-efficient method for personal context retrieval in on-device LLMs, significantly reducing memory use while improving preference alignment and response speed.
Contribution
EPIC is a novel approach that selectively retains preference-relevant data, enabling effective on-device retrieval with minimal memory and latency.
Findings
Reduces indexing memory by 2404 times
Improves preference-following accuracy by 20.17 percentage points
Achieves 33.33 times lower retrieval latency
Abstract
With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is what to store so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
