Towards Privacy-Preserving LLM Inference via Covariant Obfuscation (Technical Report)
Yu Lin, Qizhi Zhang, Wenqiang Ruan, Daode Zhang, Jue Hong, Ye Wu, Hanning Xia, Yunlong Mao, Sheng Zhong

TL;DR
This paper introduces AloePri, a novel privacy-preserving inference method for large language models that maintains high accuracy, efficiency, and compatibility with existing infrastructures, while providing strong privacy guarantees.
Contribution
AloePri is the first method to enable privacy-preserving LLM inference that balances accuracy, efficiency, and infrastructure compatibility for industrial applications.
Findings
AloePri causes only 0.0%~3.5% accuracy loss on large models.
It achieves inference efficiency comparable to plaintext inference.
AloePri effectively resists state-of-the-art privacy attacks.
Abstract
The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing prominent privacy risks associated with the transmission and processing of private data in remote inference. For privacy-preserving LLM inference technologies to be practically applied in industrial scenarios, three core requirements must be satisfied simultaneously: (1) Accuracy and efficiency losses should be minimized to mitigate degradation in service experience. (2) The inference process can be run on large-scale clusters consist of heterogeneous legacy xPUs. (3) Compatibility with existing LLM infrastructures should be ensured to reuse their engineering optimizations. To the best of our knowledge, none of the existing privacy-preserving LLM inference methods satisfy all the above constraints while delivering meaningful privacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
