Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

Yuntian Chen; Zhanyong Tang; Tianpei Lu; Bingsheng Zhang; Zhiying Shi; Zheng Wang

arXiv:2412.16537·cs.CR·July 4, 2025

Accelerating Private Large Transformers Inference through Fine-grained Collaborative Computation

Yuntian Chen, Zhanyong Tang, Tianpei Lu, Bingsheng Zhang, Zhiying Shi, Zheng Wang

PDF

Open Access

TL;DR

FASTLMPI significantly accelerates private large transformer inference by optimizing fine-grained cryptographic computations, reducing runtime and communication costs while maintaining accuracy.

Contribution

The paper introduces FASTLMPI, a novel fine-grained co-design approach for homomorphic encryption and secret sharing to improve private transformer inference efficiency.

Findings

01

54% to 64% decrease in runtime

02

72.2% reduction in communication costs

03

Enhanced approximation accuracy for non-linear functions

Abstract

Homomorphic encryption (HE) and secret sharing (SS) enable computations on encrypted data, providing significant privacy benefits for large transformer-based models (TBM) in sensitive sectors like medicine and finance. However, private TBM inference incurs significant costs due to the coarse-grained application of HE and SS. We present FASTLMPI, a new approach to accelerate private TBM inference through fine-grained computation optimization. Specifically, through the fine-grained co-design of homomorphic encryption and secret sharing, FASTLMPI achieves efficient protocols for matrix multiplication, SoftMax, LayerNorm, and GeLU. In addition, FASTLMPI introduces a precise segmented approximation technique for differentiable non-linear, improving its fitting accuracy while maintaining a low polynomial degree. Compared to solution BOLT (S&P'24), FASTLMPI shows a remarkable 54% to 64%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture · Ferroelectric and Negative Capacitance Devices · Stochastic Gradient Optimization Techniques