DynaKV: Enabling Accurate and Efficient Long-Sequence LLM Decoding on Smartphones
Tuowei Wang, Minxing Huang, Fengzu Li, Ligeng Chen, Jinrui Zhang, Ju Ren

TL;DR
DynaKV is an adaptive key-value cache management system that enhances long-sequence decoding efficiency and accuracy on smartphones by intelligently managing cache migration, flash storage, and memory resources.
Contribution
It introduces the first adaptive KVCache management approach specifically designed for long-sequence decoding on smartphones, addressing accuracy and efficiency challenges.
Findings
Achieves 1.38× higher retrieval accuracy.
Reduces end-to-end latency by 1.47×.
Extends applicability to other long-context workloads.
Abstract
As the demand for human-like reasoning, multi-turn dialogues, and long-form responses grows, large language models (LLMs) are increasingly expected to support efficient and effective long-sequence decoding. However, due to limited DRAM capacity, long-seuqence LLM decoding on smartphones is constrained by the key-value cache (KVCache), whose memory footprint increases linearly with sequence length. Retrieval-based methods mitigate DRAM pressure by offloading KVCache to flash and retrieving query-relevant entries through cluster-based indexing. Unfortunately, as decoding progresses, KVCache distribution shifts render static or local cluster updates progressively misaligned, excluding essential entries or fetching redundant ones. These issues are further exacerbated by smartphone-specific limitations in bandwidth, IOPS, and memory capacity. We propose DynaKV, the first adaptive KVCache…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Personal Information Management and User Behavior · Green IT and Sustainability
