SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment
Tushar Nayan (1), Ziqi Zhang (2), Ruimin Sun (1) ((1) Florida International University, (2) University of Illinois Urbana-Champaign)

TL;DR
SecureInfer introduces a hybrid TEE-GPU architecture for privacy-critical LLM components, enabling secure, high-performance on-device inference without exposing sensitive data to untrusted accelerators.
Contribution
It proposes a novel threat-informed partitioning scheme and implements a prototype for secure LLM inference on mobile and edge devices.
Findings
SecureInfer achieves strong security guarantees for LLM inference.
The framework maintains reasonable performance levels.
Prototype evaluation demonstrates practical deployment feasibility.
Abstract
With the increasing deployment of Large Language Models (LLMs) on mobile and edge platforms, securing them against model extraction attacks has become a pressing concern. However, protecting model privacy without sacrificing the performance benefits of untrusted AI accelerators, such as GPUs, presents a challenging trade-off. In this paper, we initiate the study of high-performance execution on LLMs and present SecureInfer, a hybrid framework that leverages a heterogeneous Trusted Execution Environments (TEEs)-GPU architecture to isolate privacy-critical components while offloading compute-intensive operations to untrusted accelerators. Building upon an outsourcing scheme, SecureInfer adopts an information-theoretic and threat-informed partitioning strategy: security-sensitive components, including non-linear layers, projection of attention head, FNN transformations, and LoRA adapters,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Cryptography and Data Security
