Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models

Chung-ju Huang; Huiqiang Zhao; Yuanpeng He; Lijian Li; Wenpin Jiao; Zhi Jin; Peixuan Chen; Leye Wang

arXiv:2603.00196·cs.CR·March 3, 2026

Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models

Chung-ju Huang, Huiqiang Zhao, Yuanpeng He, Lijian Li, Wenpin Jiao, Zhi Jin, Peixuan Chen, Leye Wang

PDF

Open Access

TL;DR

This paper introduces Talaria, a framework for confidential inference on cloud-based LLMs that protects client data and model privacy without sacrificing performance or efficiency.

Contribution

It proposes a novel partitioning and masking protocol, ReMO, enabling secure, lossless inference with strong privacy guarantees against inference attacks.

Findings

01

Reduces token inference attack accuracy from 97.5% to 1.34%.

02

Maintains identical output to original models.

03

Ensures privacy without significant efficiency loss.

Abstract

The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential privacy breaches by service providers. Existing approaches fail to ensure privacy, maintain model performance, and preserve computational efficiency simultaneously. To address this challenge, we propose Talaria, a confidential inference framework that partitions the LLM pipeline to protect client data without compromising the cloud's model intellectual property or inference quality. Talaria executes sensitive, weight-independent operations within a client-controlled Confidential Virtual Machine (CVM) while offloading weight-dependent computations to the cloud GPUs. The interaction between these environments is secured by our Reversible Masked Outsourcing (ReMO) protocol, which uses a hybrid masking technique to reversibly obscure intermediate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Cryptography and Data Security · Adversarial Robustness in Machine Learning