Power-Softmax: Towards Secure LLM Inference over Encrypted Data
Itamar Zimerman, Allon Adir, Ehud Aharoni, Matan Avitan, Moran Baruch, Nir Drucker, Jenny Lerner, Ramy Masalha, Reut Meiri, Omri Soceanu

TL;DR
This paper introduces a new polynomial-friendly self-attention mechanism enabling secure inference of large language models over encrypted data, achieving comparable reasoning abilities to standard models.
Contribution
It presents the first polynomial LLMs with over a billion parameters, surpassing previous models in size and maintaining strong reasoning and ICL capabilities.
Findings
First polynomial LLMs over a billion parameters.
Models demonstrate reasoning and ICL comparable to standard transformers.
Provides latency analysis for encrypted inference.
Abstract
Modern cryptographic methods for implementing privacy-preserving LLMs such as \gls{HE} require the LLMs to have a polynomial form. Forming such a representation is challenging because transformers include non-polynomial components, such as \Softmax and layer normalization. Previous approaches have either directly approximated pre-trained models with large-degree polynomials, which are less efficient over HE, or replaced non-polynomial components with easier-to-approximate primitives before training, e.g., \Softmax with pointwise attention. The latter approach might introduce scalability challenges. We present a new HE-friendly variant of self-attention that offers a stable form for training and is easy to approximate with polynomials for secure inference. Our work introduces the first polynomial LLMs over a billion parameters, exceeding the size of previous models by more than tenfold.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
