A Scalable Multi-GPU Framework for Encrypted Large-Model Inference
Siddharth Jayashankar, Joshua Kim, Michael B. Sullivan, Wenting Zheng, Dimitrios Skarlatos

TL;DR
Cerium is a multi-GPU framework that enables efficient encrypted inference of large models using FHE, achieving performance comparable to specialized ASICs and surpassing previous GPU solutions.
Contribution
It introduces a comprehensive GPU-based system with novel compiler and memory management techniques for scalable encrypted inference of large models.
Findings
Outperforms hand-optimized GPU libraries by up to 2.25x for small models
Achieves bootstrapping in under 10 milliseconds for the first time on GPUs
Enables encrypted inference for BERT-Base and Llama3-8B in seconds to minutes
Abstract
Encrypted AI using fully homomorphic encryption (FHE) provides strong privacy guarantees; but its slow performance has limited practical deployment. Recent works proposed ASICs to accelerate FHE, but require expensive advanced manufacturing processes that constrain their accessibility. GPUs are a far more accessible platform, but achieving ASIC-level performance using GPUs has remained elusive. Furthermore, state-of-the-art approaches primarily focus on small models that fit comfortably within a single device. Supporting large models such as LLMs in FHE introduces a dramatic increase in computational complexity that requires optimized GPU kernels, along with managing terabyte-scale memory footprints that far exceed the capacity of a single GPU. This paper presents Cerium, a multi-GPU framework for FHE inference on large models. Cerium integrates a domain-specific language, an optimizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Cryptographic Implementations and Security · Privacy-Preserving Technologies in Data
