Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference
Anes Abdennebi, Nadjia Kara, Laaziz Lahlou

TL;DR
This paper demonstrates the integration of post-quantum lattice-based homomorphic encryption into the Llama 3 model's inference pipeline, enabling privacy-preserving large language model inference with high accuracy and reasonable latency.
Contribution
It introduces a novel method of securing Llama 3 inference using fully homomorphic encryption based on post-quantum cryptography, addressing security concerns in AI applications.
Findings
Achieved up to 98% text generation accuracy
Maintained inference latency of 237 ms on an i9 CPU
Reached up to 80 tokens per second with FHE-secured inference
Abstract
The applications of Generative Artificial Intelligence (GenAI) and their intersections with data-driven fields, such as healthcare, finance, transportation, and information security, have led to significant improvements in service efficiency and low latency. However, this synergy raises serious concerns regarding the security of large language models (LLMs) and their potential impact on the privacy of companies and users' data. Many technology companies that incorporate LLMs in their services with a certain level of command and control bear a risk of data exposure and secret divulgence caused by insecure LLM pipelines, making them vulnerable to multiple attacks such as data poisoning, prompt injection, and model theft. Although several security techniques (input/output sanitization, decentralized learning, access control management, and encryption) were implemented to reduce this risk,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
