Network and Compiler Optimizations for Efficient Linear Algebra Kernels in Private Transformer Inference
Karthik Garimella, Negar Neda, Austin Ebel, Nandan Kumar Jha, Brandon Reagen

TL;DR
This paper investigates network and compiler optimizations for efficient linear algebra kernels in private transformer inference using Fully Homomorphic Encryption, demonstrating significant runtime improvements and analyzing computational bottlenecks.
Contribution
It introduces optimized linear algebra kernel implementations and network pruning strategies for FHE-based transformer inference, extending Orion framework capabilities.
Findings
BSGS outperforms packed row by up to 13.7x at transformer scale
Network pruning reduces FHE runtimes of feed forward layers by up to 11.46x
FHE primitives are memory-bound with 0.1 operations per byte of DRAM traffic
Abstract
Large language model (LLM) based services are primarily structured as client-server interactions, with clients sending queries directly to cloud providers that host LLMs. This approach currently compromises data privacy as all queries must be processed in the cloud and in the clear. Fully Homomorphic Encryption (FHE) is a solution to this data privacy issue by enabling computations directly upon encrypted queries. However, running encrypted transformer inference is challenging as programmers must map standard kernels to the constrained instruction set provided by FHE. In this work, we explore implementations of linear algebra kernels needed for transformer inference in FHE and understand how network optimization can help mitigate FHE costs while remaining performant. We leverage the Orion PyTorch to FHE framework to benchmark several linear algebra kernels in order to profile two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques
