Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs
Sangpyo Kim, Wonkyung Jung, Jaiyoung Park, Jung Ho Ahn

TL;DR
This paper analyzes the differences between NTT and DFT in homomorphic encryption, identifies memory bandwidth bottlenecks in GPU implementations, and proposes a novel on-the-fly root generation scheme to significantly accelerate NTT computations.
Contribution
It provides a detailed analysis of NTT versus DFT, identifies GPU memory bottlenecks for large HE parameters, and introduces a new on-the-fly twiddling method to improve NTT performance.
Findings
NTT suffers from main-memory bandwidth bottleneck on large parameters.
The proposed on-the-fly twiddling scheme reduces NTT computation time.
Achieves 4.2x speedup over baseline GPU implementation.
Abstract
Homomorphic encryption (HE) draws huge attention as it provides a way of privacy-preserving computations on encrypted messages. Number Theoretic Transform (NTT), a specialized form of Discrete Fourier Transform (DFT) in the finite field of integers, is the key algorithm that enables fast computation on encrypted ciphertexts in HE. Prior works have accelerated NTT and its inverse transformation on a popular parallel processing platform, GPU, by leveraging DFT optimization techniques. However, these GPU-based studies lack a comprehensive analysis of the primary differences between NTT and DFT or only consider small HE parameters that have tight constraints in the number of arithmetic operations that can be performed without decryption. In this paper, we analyze the algorithmic characteristics of NTT and DFT and assess the performance of NTT when we apply the optimizations that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
