Benchmarking NIST-Standardised ML-KEM and ML-DSA on ARM Cortex-M0+: Performance, Memory, and Energy on the RP2040
Rojin Chhetri

TL;DR
This paper provides the first systematic benchmarks of NIST-standardized ML-KEM and ML-DSA algorithms on ARM Cortex-M0+ processors, measuring performance, memory, and energy consumption on the RP2040 platform.
Contribution
It introduces isolated algorithm-level benchmarks for ML-KEM and ML-DSA on constrained 32-bit hardware, with open-source code for reproducibility.
Findings
ML-KEM-512 completes key exchange in 35.7 ms, 17x faster than ECDH P-256.
ML-DSA signing shows high latency variance due to rejection sampling.
Cortex-M0+ incurs only 1.8-1.9x slowdown compared to Cortex-M4 results.
Abstract
The migration to post-quantum cryptography is urgent for Internet of Things devices with 10--20 year lifespans, yet no systematic benchmarks exist for the finalised NIST standards on the most constrained 32-bit processor class. This paper presents the first isolated algorithm-level benchmarks of ML-KEM (FIPS 203) and ML-DSA (FIPS 204) on ARM Cortex-M0+, measured on the RP2040 (Raspberry Pi Pico) at 133 MHz with 264 KB SRAM. Using PQClean reference C implementations, we measure all three security levels of ML-KEM (512/768/1024) and ML-DSA (44/65/87) across key generation, encapsulation/signing, and decapsulation/verification. ML-KEM-512 completes a full key exchange in 35.7 ms with an estimated energy cost of 2.83 mJ (datasheet power model)--17x faster than a complete ECDH P-256 key agreement on the same hardware. ML-DSA signing exhibits high latency variance due to rejection sampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
