PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN
Fei Zheng, Chaochao Chen, Zhongxuan Han, Xiaolin Zheng

TL;DR
PermLLM introduces a novel approach using secure random permutation and optimized protocols to enable private inference of large language models like ChatGLM-6B within 3 seconds per token, significantly faster than previous methods.
Contribution
It presents PermLLM, a new method combining secure permutation, secret sharing, and homomorphic encryption to drastically improve private LLM inference speed.
Findings
Achieves private inference of ChatGLM-6B in about 3 seconds per token.
Outperforms existing MPC solutions by a large margin in speed.
Operates efficiently under realistic network conditions.
Abstract
The emergence of ChatGPT marks the arrival of the large language model (LLM) era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed to protect both the privacy of the model parameters and user queries. However, they require gigabytes of data transfer and several minutes to generate just one token, making them impractical for most real-world applications. To improve the efficiency of private LLM inference, we propose PermLLM, which accelerates the evaluation of non-linear functions using secure random permutation. Along with the optimized secret sharing protocols and homomorphic encryption, PermLLM achieves two-party private…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
