Network and Compiler Optimizations for Efficient Linear Algebra Kernels in Private Transformer Inference

Karthik Garimella; Negar Neda; Austin Ebel; Nandan Kumar Jha; Brandon Reagen

arXiv:2512.11135·cs.CR·December 15, 2025

Network and Compiler Optimizations for Efficient Linear Algebra Kernels in Private Transformer Inference

Karthik Garimella, Negar Neda, Austin Ebel, Nandan Kumar Jha, Brandon Reagen

PDF

Open Access

TL;DR

This paper investigates network and compiler optimizations for efficient linear algebra kernels in private transformer inference using Fully Homomorphic Encryption, demonstrating significant runtime improvements and analyzing computational bottlenecks.

Contribution

It introduces optimized linear algebra kernel implementations and network pruning strategies for FHE-based transformer inference, extending Orion framework capabilities.

Findings

01

BSGS outperforms packed row by up to 13.7x at transformer scale

02

Network pruning reduces FHE runtimes of feed forward layers by up to 11.46x

03

FHE primitives are memory-bound with 0.1 operations per byte of DRAM traffic

Abstract

Large language model (LLM) based services are primarily structured as client-server interactions, with clients sending queries directly to cloud providers that host LLMs. This approach currently compromises data privacy as all queries must be processed in the cloud and in the clear. Fully Homomorphic Encryption (FHE) is a solution to this data privacy issue by enabling computations directly upon encrypted queries. However, running encrypted transformer inference is challenging as programmers must map standard kernels to the constrained instruction set provided by FHE. In this work, we explore implementations of linear algebra kernels needed for transformer inference in FHE and understand how network optimization can help mitigate FHE costs while remaining performant. We leverage the Orion PyTorch to FHE framework to benchmark several linear algebra kernels in order to profile two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques