CipherPrune: Efficient and Scalable Private Transformer Inference

Yancheng Zhang; Jiaqi Xue; Mengxin Zheng; Mimi Xie; Mingzhe Zhang; Lei; Jiang; Qian Lou

arXiv:2502.16782·cs.LG·March 7, 2025

CipherPrune: Efficient and Scalable Private Transformer Inference

Yancheng Zhang, Jiaqi Xue, Mengxin Zheng, Mimi Xie, Mingzhe Zhang, Lei, Jiang, Qian Lou

PDF

Open Access 1 Repo 1 Video

TL;DR

CipherPrune introduces a privacy-preserving Transformer inference framework that significantly reduces runtime overhead and improves scalability by adaptively pruning tokens and reducing polynomial degrees, with minimal accuracy loss.

Contribution

It presents a novel secure token pruning and polynomial reduction protocol that enhances efficiency and scalability of private Transformer inference.

Findings

01

Reduces private Transformer inference overhead by up to 10.6x.

02

Effectively prunes unimportant tokens in encrypted form.

03

Maintains high accuracy with minimal performance trade-offs.

Abstract

Private Transformer inference using cryptographic protocols offers promising solutions for privacy-preserving machine learning; however, it still faces significant runtime overhead (efficiency issues) and challenges in handling long-token inputs (scalability issues). We observe that the Transformer's operational complexity scales quadratically with the number of input tokens, making it essential to reduce the input token length. Notably, each token varies in importance, and many inputs contain redundant tokens. Additionally, prior private inference methods that rely on high-degree polynomial approximations for non-linear activations are computationally expensive. Therefore, reducing the polynomial degree for less important tokens can significantly accelerate private inference. Building on these observations, we propose \textit{CipherPrune}, an efficient and scalable private inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucf-lou-lab-pet/cipher-prune-inference
noneOfficial

Videos

CipherPrune: Efficient and Scalable Private Transformer Inference· slideslive

Taxonomy

TopicsCoding theory and cryptography · DNA and Biological Computing · graph theory and CDMA systems

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer