DynaKV: Enabling Accurate and Efficient Long-Sequence LLM Decoding on Smartphones

Tuowei Wang; Minxing Huang; Fengzu Li; Ligeng Chen; Jinrui Zhang; Ju Ren

arXiv:2511.07427·cs.DC·November 12, 2025

DynaKV: Enabling Accurate and Efficient Long-Sequence LLM Decoding on Smartphones

Tuowei Wang, Minxing Huang, Fengzu Li, Ligeng Chen, Jinrui Zhang, Ju Ren

PDF

Open Access

TL;DR

DynaKV is an adaptive key-value cache management system that enhances long-sequence decoding efficiency and accuracy on smartphones by intelligently managing cache migration, flash storage, and memory resources.

Contribution

It introduces the first adaptive KVCache management approach specifically designed for long-sequence decoding on smartphones, addressing accuracy and efficiency challenges.

Findings

01

Achieves 1.38× higher retrieval accuracy.

02

Reduces end-to-end latency by 1.47×.

03

Extends applicability to other long-context workloads.

Abstract

As the demand for human-like reasoning, multi-turn dialogues, and long-form responses grows, large language models (LLMs) are increasingly expected to support efficient and effective long-sequence decoding. However, due to limited DRAM capacity, long-seuqence LLM decoding on smartphones is constrained by the key-value cache (KVCache), whose memory footprint increases linearly with sequence length. Retrieval-based methods mitigate DRAM pressure by offloading KVCache to flash and retrieving query-relevant entries through cluster-based indexing. Unfortunately, as decoding progresses, KVCache distribution shifts render static or local cluster updates progressively misaligned, excluding essential entries or fetching redundant ones. These issues are further exacerbated by smartphone-specific limitations in bandwidth, IOPS, and memory capacity. We propose DynaKV, the first adaptive KVCache…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Personal Information Management and User Behavior · Green IT and Sustainability