TruncFormer: Private LLM Inference Using Only Truncations

Patrick Yubeaton; Jianqiao Cambridge Mo; Karthik Garimella; Nandan; Kumar Jha; Brandon Reagen; Chinmay Hegde; Siddharth Garg

arXiv:2412.01042·cs.CR·December 3, 2024

TruncFormer: Private LLM Inference Using Only Truncations

Patrick Yubeaton, Jianqiao Cambridge Mo, Karthik Garimella, Nandan, Kumar Jha, Brandon Reagen, Chinmay Hegde, Siddharth Garg

PDF

Open Access

TL;DR

TruncFormer introduces a novel framework that transforms large language models into private inference models by approximating nonlinearities with truncations, significantly reducing latency without sacrificing privacy.

Contribution

It presents a general method to emulate LLM nonlinearities using truncations, enabling private inference with lower latency across various architectures.

Findings

01

Latency improvements over existing protocols

02

Applicable to any LLM architecture

03

Open source implementation available

Abstract

Private inference (PI) serves an important role in guaranteeing the privacy of user data when interfacing with proprietary machine learning models such as LLMs. However, PI remains practically intractable due to the massive latency costs associated with nonlinear functions present in LLMs. Existing works have focused on improving latency of specific LLM nonlinearities (such as the Softmax, or the GeLU) via approximations. However, new types of nonlinearities are regularly introduced with new LLM architectures, and this has led to a constant game of catch-up where PI researchers attempt to optimize the newest nonlinear function. We introduce TruncFormer, a framework for taking any LLM and transforming it into a plaintext emulation of PI. Our framework leverages the fact that nonlinearities in LLMs are differentiable and can be accurately approximated with a sequence of additions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, Economics, and Judicial Systems

MethodsSoftmax