Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity

Guang Yan; Yuhui Zhang; Zimu Guo; Lutan Zhao; Xiaojun Chen; Chen Wang; Wenhao Wang; Dan Meng; Rui Hou

arXiv:2505.07239·cs.CR·May 13, 2025

Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity

Guang Yan, Yuhui Zhang, Zimu Guo, Lutan Zhao, Xiaojun Chen, Chen Wang, Wenhao Wang, Dan Meng, Rui Hou

PDF

TL;DR

Comet is a private inference system for large language models that leverages activation sparsity prediction to significantly reduce communication and computation overhead in secure multi-party computation settings.

Contribution

It introduces a novel predictor for activation sparsity and a new private inference protocol that avoids zero computations, enhancing efficiency in privacy-preserving LLM inference.

Findings

01

Achieves 1.87x-2.63x speedup over state-of-the-art systems

02

Reduces communication by 1.94x-2.64x

03

Effectively exploits activation sparsity for privacy-preserving inference

Abstract

With the growing use of large language models (LLMs) hosted on cloud platforms to offer inference services, privacy concerns about the potential leakage of sensitive information are escalating. Secure multi-party computation (MPC) is a promising solution to protect the privacy in LLM inference. However, MPC requires frequent inter-server communication, causing high performance overhead. Inspired by the prevalent activation sparsity of LLMs, where most neuron are not activated after non-linear activation functions, we propose an efficient private inference system, Comet. This system employs an accurate and fast predictor to predict the sparsity distribution of activation function output. Additionally, we introduce a new private inference protocol. It efficiently and securely avoids computations involving zero values by exploiting the spatial locality of the predicted sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.