A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage
Huan Yang, Deyu Zhang, Yudong Zhao, Yuanchun Li, Yunxin Liu

TL;DR
This paper introduces KV-Shield, a novel method for secure on-device LLM inference that prevents KV pair leakage on GPUs by using permutation within TEE, balancing privacy and performance.
Contribution
KV-Shield is a new approach that permutes weight matrices and attention vectors within TEE to secure intermediate data during on-device LLM inference.
Findings
KV-Shield effectively prevents KV pair leakage.
The method maintains inference accuracy and efficiency.
Theoretical analysis confirms correctness and security benefits.
Abstract
Running LLMs on end devices has garnered significant attention recently due to their advantages in privacy preservation. With the advent of lightweight LLM models and specially designed GPUs, on-device LLM inference has achieved the necessary accuracy and performance metrics. However, we have identified that LLM inference on GPUs can leak privacy-sensitive intermediate information, specifically the KV pairs. An attacker could exploit these KV pairs to reconstruct the entire user conversation, leading to significant vulnerabilities. Existing solutions, such as Fully Homomorphic Encryption (FHE) and Trusted Execution Environments (TEE), are either too computation-intensive or resource-limited. To address these issues, we designed KV-Shield, which operates in two phases. In the initialization phase, it permutes the weight matrices so that all KV pairs are correspondingly permuted. During…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Security and Resilience · Software-Defined Networks and 5G · Network Security and Intrusion Detection
MethodsSoftmax · Attention Is All You Need
