Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B
Jaiyoung Park, Sejin Park, Jai Hyun Park, Jung Ho Ahn, Jung Hee Cheon, Guillaume Hanrot, Jung Woo Kim, Minje Park, Damien Stehl\'e

TL;DR
This paper introduces a novel CKKS-based homomorphic encryption framework enabling privacy-preserving inference on large language models like Llama-2-7B with thousands of tokens, optimizing for partial encryption and outlier mitigation.
Contribution
It presents an unbalanced chunked prefill framework, new homomorphic algorithms, and outlier reduction techniques for efficient private LLM inference without retraining.
Findings
Supports up to 4096 tokens with partial encryption
Achieves inference in 85s for summarization on GPU cluster
Reduces outlier impact without retraining
Abstract
As large language models (LLMs) become ubiquitous, privacy concerns pertaining to inference inputs keep growing. In this context, fully homomorphic encryption (FHE) has emerged as a primary cryptographic solution to provide non-interactive confidential LLM inference. Existing solutions scale poorly with the input token length, and hence focus either on small models or larger models with a small number of input tokens. They also suffer from the existence of large outlier values. These values have a strong impact on the evaluation of non-linear layers, leading to large-degree polynomial approximation and thus heavy evaluation costs. We propose an FHE-based private LLM inference solution that allows thousands of input tokens with only a part of them being encrypted: this fits with a scenario where the context is benign and only part of the input is sensitive. To do so, we suggest an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Cryptography and Residue Arithmetic
