ESPM-D: Efficient Sparse Polynomial Multiplication for Dilithium on ARM Cortex-M4 and Apple M2
Jieyu Zheng, Hong Zhang, Le Tian, Zhuo Zhang, Hanyu Wei, Zhiwei Chu,, Yafang Yang, Yunlei Zhao

TL;DR
This paper presents optimized sparse polynomial multiplication techniques for Dilithium, achieving significant speedups and reduced resource usage on ARM Cortex-M4 and Apple M2 platforms, advancing post-quantum cryptography implementations.
Contribution
It introduces platform-specific optimization strategies for Dilithium's polynomial multiplication, significantly improving efficiency on resource-constrained and high-performance devices.
Findings
Up to 30% speedup on ARM Cortex-M4
Up to 55% speedup on Apple M2
Reduced stack usage and enhanced signing performance
Abstract
Dilithium is a lattice-based digital signature scheme standardized by the NIST post-quantum cryptography (PQC) project. In this study, we focus on developing efficient sparse polynomial multiplication implementations of Dilithium for ARM Cortex-M4 and Apple M2, which are both based on the ARM architecture. The ARM Cortex-M4 is commonly utilized in resource-constrained devices such as sensors. Conversely, the Apple M2 is typically found on mobile devices, emphasizing high performance and versatility. Accordingly, our optimization strategies differ between ARM Cortex-M4 and Apple M2. We prioritize optimizing stack usage for the former while enhancing computational efficiency for the latter. Our optimized sparse polynomial multiplication achieves significant speedups of up to 30% on ARM Cortex-M4 and 55% on Apple M2 compared to the state-of-the-art Number-Theoretic Transform (NTT)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Low-power high-performance VLSI design · Interconnection Networks and Systems
