Comet: A Communication-efficient and Performant Approximation for Private Transformer Inference
Xiangrui Xu, Qiao Zhang, Rui Ning, Chunsheng Xin, Hongyi Wu

TL;DR
This paper introduces Comet, a novel method that significantly reduces communication costs in private Transformer inference, achieving faster performance with minimal accuracy loss.
Contribution
Comet is a new plug-in approach that effectively cuts communication overhead and speeds up private Transformer inference without sacrificing model accuracy.
Findings
Up to 3.9× less communication required
Achieves 3.5× speedups in inference
Maintains competitive performance on GLUE benchmarks
Abstract
The prevalent use of Transformer-like models, exemplified by ChatGPT in modern language processing applications, underscores the critical need for enabling private inference essential for many cloud-based services reliant on such models. However, current privacy-preserving frameworks impose significant communication burden, especially for non-linear computation in Transformer model. In this paper, we introduce a novel plug-in method Comet to effectively reduce the communication cost without compromising the inference performance. We second introduce an efficient approximation method to eliminate the heavy communication in finding good initial approximation. We evaluate our Comet on Bert and RoBERTa models with GLUE benchmark datasets, showing up to 3.9 less communication and 3.5 speedups while keep competitive model performance compared to the prior art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Complexity and Algorithms in Graphs
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Softmax · RoBERTa · Layer Normalization · BERT
