Power-Softmax: Towards Secure LLM Inference over Encrypted Data

Itamar Zimerman; Allon Adir; Ehud Aharoni; Matan Avitan; Moran Baruch; Nir Drucker; Jenny Lerner; Ramy Masalha; Reut Meiri; Omri Soceanu

arXiv:2410.09457·cs.LG·May 6, 2026

Power-Softmax: Towards Secure LLM Inference over Encrypted Data

Itamar Zimerman, Allon Adir, Ehud Aharoni, Matan Avitan, Moran Baruch, Nir Drucker, Jenny Lerner, Ramy Masalha, Reut Meiri, Omri Soceanu

PDF

TL;DR

This paper introduces a new polynomial-friendly self-attention mechanism enabling secure inference of large language models over encrypted data, achieving comparable reasoning abilities to standard models.

Contribution

It presents the first polynomial LLMs with over a billion parameters, surpassing previous models in size and maintaining strong reasoning and ICL capabilities.

Findings

01

First polynomial LLMs over a billion parameters.

02

Models demonstrate reasoning and ICL comparable to standard transformers.

03

Provides latency analysis for encrypted inference.

Abstract

Modern cryptographic methods for implementing privacy-preserving LLMs such as \gls{HE} require the LLMs to have a polynomial form. Forming such a representation is challenging because transformers include non-polynomial components, such as \Softmax and layer normalization. Previous approaches have either directly approximated pre-trained models with large-degree polynomials, which are less efficient over HE, or replaced non-polynomial components with easier-to-approximate primitives before training, e.g., \Softmax with pointwise attention. The latter approach might introduce scalability challenges. We present a new HE-friendly variant of self-attention that offers a stable form for training and is easy to approximate with polynomials for secure inference. Our work introduces the first polynomial LLMs over a billion parameters, exceeding the size of previous models by more than tenfold.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.