Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference

Anes Abdennebi; Nadjia Kara; Laaziz Lahlou

arXiv:2604.12168·cs.CR·April 15, 2026

Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference

Anes Abdennebi, Nadjia Kara, Laaziz Lahlou

PDF

TL;DR

This paper demonstrates the integration of post-quantum lattice-based homomorphic encryption into the Llama 3 model's inference pipeline, enabling privacy-preserving large language model inference with high accuracy and reasonable latency.

Contribution

It introduces a novel method of securing Llama 3 inference using fully homomorphic encryption based on post-quantum cryptography, addressing security concerns in AI applications.

Findings

01

Achieved up to 98% text generation accuracy

02

Maintained inference latency of 237 ms on an i9 CPU

03

Reached up to 80 tokens per second with FHE-secured inference

Abstract

The applications of Generative Artificial Intelligence (GenAI) and their intersections with data-driven fields, such as healthcare, finance, transportation, and information security, have led to significant improvements in service efficiency and low latency. However, this synergy raises serious concerns regarding the security of large language models (LLMs) and their potential impact on the privacy of companies and users' data. Many technology companies that incorporate LLMs in their services with a certain level of command and control bear a risk of data exposure and secret divulgence caused by insecure LLM pipelines, making them vulnerable to multiple attacks such as data poisoning, prompt injection, and model theft. Although several security techniques (input/output sanitization, decentralized learning, access control management, and encryption) were implemented to reduce this risk,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.