From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

Changmin Lee; Jaemin Kim; Taesik Gong

arXiv:2605.18271·cs.CL·May 19, 2026

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

Changmin Lee, Jaemin Kim, Taesik Gong

PDF

TL;DR

This paper introduces EPIC, a memory-efficient method for personal context retrieval in on-device LLMs, significantly reducing memory use while improving preference alignment and response speed.

Contribution

EPIC is a novel approach that selectively retains preference-relevant data, enabling effective on-device retrieval with minimal memory and latency.

Findings

01

Reduces indexing memory by 2404 times

02

Improves preference-following accuracy by 20.17 percentage points

03

Achieves 33.33 times lower retrieval latency

Abstract

With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is what to store so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.