SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment

Tushar Nayan (1); Ziqi Zhang (2); Ruimin Sun (1) ((1) Florida International University; (2) University of Illinois Urbana-Champaign)

arXiv:2510.19979·cs.CR·October 24, 2025

SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment

Tushar Nayan (1), Ziqi Zhang (2), Ruimin Sun (1) ((1) Florida International University, (2) University of Illinois Urbana-Champaign)

PDF

Open Access

TL;DR

SecureInfer introduces a hybrid TEE-GPU architecture for privacy-critical LLM components, enabling secure, high-performance on-device inference without exposing sensitive data to untrusted accelerators.

Contribution

It proposes a novel threat-informed partitioning scheme and implements a prototype for secure LLM inference on mobile and edge devices.

Findings

01

SecureInfer achieves strong security guarantees for LLM inference.

02

The framework maintains reasonable performance levels.

03

Prototype evaluation demonstrates practical deployment feasibility.

Abstract

With the increasing deployment of Large Language Models (LLMs) on mobile and edge platforms, securing them against model extraction attacks has become a pressing concern. However, protecting model privacy without sacrificing the performance benefits of untrusted AI accelerators, such as GPUs, presents a challenging trade-off. In this paper, we initiate the study of high-performance execution on LLMs and present SecureInfer, a hybrid framework that leverages a heterogeneous Trusted Execution Environments (TEEs)-GPU architecture to isolate privacy-critical components while offloading compute-intensive operations to untrusted accelerators. Building upon an outsourcing scheme, SecureInfer adopts an information-theoretic and threat-informed partitioning strategy: security-sensitive components, including non-linear layers, projection of attention head, FNN transformations, and LoRA adapters,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Cryptography and Data Security