CipherPrune: Efficient and Scalable Private Transformer Inference
Yancheng Zhang, Jiaqi Xue, Mengxin Zheng, Mimi Xie, Mingzhe Zhang, Lei, Jiang, Qian Lou

TL;DR
CipherPrune introduces a privacy-preserving Transformer inference framework that significantly reduces runtime overhead and improves scalability by adaptively pruning tokens and reducing polynomial degrees, with minimal accuracy loss.
Contribution
It presents a novel secure token pruning and polynomial reduction protocol that enhances efficiency and scalability of private Transformer inference.
Findings
Reduces private Transformer inference overhead by up to 10.6x.
Effectively prunes unimportant tokens in encrypted form.
Maintains high accuracy with minimal performance trade-offs.
Abstract
Private Transformer inference using cryptographic protocols offers promising solutions for privacy-preserving machine learning; however, it still faces significant runtime overhead (efficiency issues) and challenges in handling long-token inputs (scalability issues). We observe that the Transformer's operational complexity scales quadratically with the number of input tokens, making it essential to reduce the input token length. Notably, each token varies in importance, and many inputs contain redundant tokens. Additionally, prior private inference methods that rely on high-degree polynomial approximations for non-linear activations are computationally expensive. Therefore, reducing the polynomial degree for less important tokens can significantly accelerate private inference. Building on these observations, we propose \textit{CipherPrune}, an efficient and scalable private inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCoding theory and cryptography · DNA and Biological Computing · graph theory and CDMA systems
MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer
