TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone
Xunjie Wang, Jiacheng Shi, Zihan Zhao, Yang Yu, Zhichao Hua, Jinyu Gu

TL;DR
This paper presents TZ-LLM, a system that secures on-device large language models using Arm TrustZone, achieving significant reductions in latency and improvements in decoding speed through innovative memory and NPU management techniques.
Contribution
The paper introduces pipelined restoration and a co-driver design to efficiently protect LLMs within TrustZone, addressing memory and NPU sharing challenges.
Findings
TTFT reduced by up to 90.9%
Decoding speed increased by up to 23.2%
System successfully implemented on Arm devices
Abstract
Large Language Models (LLMs) deployed on mobile devices offer benefits like user privacy and reduced network latency, but introduce a significant security risk: the leakage of proprietary models to end users. To mitigate this risk, we propose a system design for protecting on-device LLMs using Arm Trusted Execution Environment (TEE), TrustZone. Our system addresses two primary challenges: (1) The dilemma between memory efficiency and fast inference (caching model parameters within TEE memory). (2) The lack of efficient and secure Neural Processing Unit (NPU) time-sharing between Rich Execution Environment (REE) and TEE. Our approach incorporates two key innovations. First, we employ pipelined restoration, leveraging the deterministic memory access patterns of LLM inference to prefetch parameters on demand, hiding memory allocation, I/O and decryption latency under computation time.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
