Comet: A Communication-efficient and Performant Approximation for   Private Transformer Inference

Xiangrui Xu; Qiao Zhang; Rui Ning; Chunsheng Xin; Hongyi Wu

arXiv:2405.17485·cs.LG·September 10, 2024·1 cites

Comet: A Communication-efficient and Performant Approximation for Private Transformer Inference

Xiangrui Xu, Qiao Zhang, Rui Ning, Chunsheng Xin, Hongyi Wu

PDF

Open Access

TL;DR

This paper introduces Comet, a novel method that significantly reduces communication costs in private Transformer inference, achieving faster performance with minimal accuracy loss.

Contribution

Comet is a new plug-in approach that effectively cuts communication overhead and speeds up private Transformer inference without sacrificing model accuracy.

Findings

01

Up to 3.9× less communication required

02

Achieves 3.5× speedups in inference

03

Maintains competitive performance on GLUE benchmarks

Abstract

The prevalent use of Transformer-like models, exemplified by ChatGPT in modern language processing applications, underscores the critical need for enabling private inference essential for many cloud-based services reliant on such models. However, current privacy-preserving frameworks impose significant communication burden, especially for non-linear computation in Transformer model. In this paper, we introduce a novel plug-in method Comet to effectively reduce the communication cost without compromising the inference performance. We second introduce an efficient approximation method to eliminate the heavy communication in finding good initial approximation. We evaluate our Comet on Bert and RoBERTa models with GLUE benchmark datasets, showing up to 3.9 $\times$ less communication and 3.5 $\times$ speedups while keep competitive model performance compared to the prior art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Complexity and Algorithms in Graphs

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Softmax · RoBERTa · Layer Normalization · BERT