# SCE-NTT: A Hardware Accelerator for Number Theoretic Transform Using Superconductor Electronics

**Authors:** Sasan Razmkhah, Mingye Li, Zeming Cheng, Robert S. Aviles, Kyle Jackman, Joey Delport, Lieze Schindler, Wenhui Luo, Takuya Suzuki, Mehdi Kamal, Christopher L. Ayala, Coenrad J. Fourie, Nabuyuki Yoshikawa, Peter A. Beerel, Sandeep Gupta, Massoud Pedram

arXiv: 2508.21265 · 2025-09-01

## TL;DR

This paper introduces SCE-NTT, a superconducting hardware accelerator for the Number-Theoretic Transform, achieving over 100x speedup compared to CMOS, enabling scalable and energy-efficient homomorphic encryption computations.

## Contribution

The paper presents the first superconducting NTT accelerator using SFQ logic, with a novel pipelined architecture and a new cell library, significantly outperforming existing CMOS solutions.

## Key findings

- Achieves 531 million NTT/sec at 34 GHz
- Over 100x faster than CMOS equivalents
- Scales to larger NTT sizes with sub-microsecond latency

## Abstract

This research explores the use of superconductor electronics (SCE) for accelerating fully homomorphic encryption (FHE), focusing on the Number-Theoretic Transform (NTT), a key computational bottleneck in FHE schemes. We present SCE-NTT, a dedicated hardware accelerator based on superconductive single flux quantum (SFQ) logic and memory, targeting high performance and energy efficiency beyond the limits of conventional CMOS. To address SFQ constraints such as limited dense RAM and restricted fanin/fanout, we propose a deeply pipelined NTT-128 architecture using shift register memory (SRM). Designed for N=128 32-bit coefficients, NTT-128 comprises log2(N)=7 processing elements (PEs), each featuring a butterfly unit (BU), dual coefficient memories operating in ping-pong mode via FIFO-based SRM queues, and twiddle factor buffers. The BU integrates a Shoup modular multiplier optimized for a small area, leveraging precomputed twiddle factors. A new RSFQ cell library with over 50 parameterized cells, including compound logic units, was developed for implementation. Functional and timing correctness were validated using JoSIM analog simulations and Verilog models. A multiphase clocking scheme was employed to enhance robustness and reduce path-balancing overhead, improving circuit reliability. Fabricated results show the NTT-128 unit achieves 531 million NTT/sec at 34 GHz, over 100x faster than state-of-the-art CMOS equivalents. We also project that the architecture can scale to larger sizes, such as a 2^14-point NTT in approximately 482 ns. Key-switch throughput is estimated at 1.63 million operations/sec, significantly exceeding existing hardware. These results demonstrate the strong potential of SCE-based accelerators for scalable, energy-efficient secure computation in the post-quantum era, with further gains anticipated through advances in fabrication.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21265/full.md

## Figures

22 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21265/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/2508.21265/full.md

---
Source: https://tomesphere.com/paper/2508.21265