TL;DR
This paper introduces CROSS, a compiler framework that adapts homomorphic encryption workloads to AI ASIC architectures like TPUs, significantly improving energy efficiency and throughput.
Contribution
CROSS systematically transforms HE workloads to match TPU architecture, enabling ASIC-level efficiency for homomorphic encryption operations.
Findings
CROSS achieves higher throughput per watt than existing HE acceleration frameworks.
BAT converts high-precision modular arithmetic into low-precision matrix multiplications.
MAT embeds data reordering into compute kernels, reducing runtime overhead.
Abstract
Homomorphic Encryption (HE) provides strong data privacy for cloud services but at the cost of prohibitive computational overhead. While GPUs have emerged as a practical platform for accelerating HE, there remains an order-of-magnitude energy-efficiency gap compared to specialized (but expensive) HE ASICs. This paper explores an alternate direction: leveraging existing AI accelerators, like Google's TPUs with coarse-grained compute and memory architectures, to offer a path toward ASIC-level energy efficiency for HE. However, this architectural paradigm creates a fundamental mismatch with SoTA HE algorithms designed for GPUs. These algorithms rely heavily on: (1) high-precision (32-bit) integer arithmetic to now run on a TPU's low-throughput vector unit, leaving its high-throughput low-precision (8-bit) matrix engine (MXU) idle, and (2) fine-grained data permutations that are inefficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
CROSS — Leveraging AI ASICs for Homomorphic Encryption· youtube
