A First Look At Efficient And Secure On-Device LLM Inference Against KV   Leakage

Huan Yang; Deyu Zhang; Yudong Zhao; Yuanchun Li; Yunxin Liu

arXiv:2409.04040·cs.CR·September 9, 2024

A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage

Huan Yang, Deyu Zhang, Yudong Zhao, Yuanchun Li, Yunxin Liu

PDF

Open Access

TL;DR

This paper introduces KV-Shield, a novel method for secure on-device LLM inference that prevents KV pair leakage on GPUs by using permutation within TEE, balancing privacy and performance.

Contribution

KV-Shield is a new approach that permutes weight matrices and attention vectors within TEE to secure intermediate data during on-device LLM inference.

Findings

01

KV-Shield effectively prevents KV pair leakage.

02

The method maintains inference accuracy and efficiency.

03

Theoretical analysis confirms correctness and security benefits.

Abstract

Running LLMs on end devices has garnered significant attention recently due to their advantages in privacy preservation. With the advent of lightweight LLM models and specially designed GPUs, on-device LLM inference has achieved the necessary accuracy and performance metrics. However, we have identified that LLM inference on GPUs can leak privacy-sensitive intermediate information, specifically the KV pairs. An attacker could exploit these KV pairs to reconstruct the entire user conversation, leading to significant vulnerabilities. Existing solutions, such as Fully Homomorphic Encryption (FHE) and Trusted Execution Environments (TEE), are either too computation-intensive or resource-limited. To address these issues, we designed KV-Shield, which operates in two phases. In the initialization phase, it permutes the weight matrices so that all KV pairs are correspondingly permuted. During…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Security and Resilience · Software-Defined Networks and 5G · Network Security and Intrusion Detection

MethodsSoftmax · Attention Is All You Need