FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference
Chenqi Lin, Tianshi Xu, Zebin Yang, Runsheng Wang, Ru Huang, Meng Li

TL;DR
FastQuery is a novel framework that significantly reduces computation and communication overhead in private LLM inference by optimizing embedding table queries through quantization and one-hot-aware packing, enabling more efficient privacy-preserving inference.
Contribution
FastQuery introduces a communication-aware quantization and one-hot-aware packing approach to optimize private embedding table queries, outperforming prior HE-based methods.
Findings
Achieves over 4.3x latency reduction compared to Cheetah.
Reduces communication by more than 75.7x on LLAMA-7B.
Demonstrates significant efficiency improvements on LLAMA-30B.
Abstract
With the fast evolution of large language models (LLMs), privacy concerns with user queries arise as they may contain sensitive information. Private inference based on homomorphic encryption (HE) has been proposed to protect user query privacy. However, a private embedding table query has to be formulated as a HE-based matrix-vector multiplication problem and suffers from enormous computation and communication overhead. We observe the overhead mainly comes from the neglect of 1) the one-hot nature of user queries and 2) the robustness of the embedding table to low bit-width quantization noise. Hence, in this paper, we propose a private embedding table query optimization framework, dubbed FastQuery. FastQuery features a communication-aware embedding table quantization algorithm and a one-hot-aware dense packing algorithm to simultaneously reduce both the computation and communication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Library Science and Information Systems
