# Microarchitecture Design and Benchmarking of Custom SHA-3 Instruction for RISC-V

**Authors:** Alperen Bolat, Sakir Sezer, Kieran McLaughlin, Henry Hui

arXiv: 2508.20653 · 2025-08-29

## TL;DR

This paper designs and benchmarks a custom SHA-3 instruction for RISC-V, demonstrating significant performance gains with modest hardware overhead through simulation and FPGA prototyping.

## Contribution

It introduces a novel microarchitectural implementation of SHA-3 as a custom instruction in RISC-V, with comprehensive performance evaluation.

## Key findings

- Up to 8.02x performance improvement in RISC-V SHA-3 workloads
- Up to 46.31x performance gain for Keccak-specific workloads
- 15.09% increase in registers and 11.51% LUT utilization

## Abstract

Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations. Hardware-assisted cryptographic instructions, such as Intel's AES-NI and ARM's custom instructions for encryption workloads, have demonstrated substantial performance improvements. However, efficient SHA-3 acceleration remains an open problem due to its distinct permutation-based structure and memory access patterns. Existing solutions primarily rely on standalone coprocessors or software optimizations, often avoiding the complexities of direct microarchitectural integration. This study investigates the architectural challenges of embedding a SHA-3 permutation operation as a custom instruction within a general-purpose processor, focusing on pipelined simultaneous execution, storage utilization, and hardware cost. In this paper, we investigated and prototyped a SHA-3 custom instruction for the RISC-V CPU architecture. Using cycle-accurate GEM5 simulations and FPGA prototyping, our results demonstrate performance improvements of up to 8.02x for RISC-V optimized SHA-3 software workloads and up to 46.31x for Keccak-specific software workloads, with only a 15.09% increase in registers and a 11.51% increase in LUT utilization. These findings provide critical insights into the feasibility and impact of SHA-3 acceleration at the microarchitectural level, highlighting practical design considerations for future cryptographic instruction set extensions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20653/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20653/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/2508.20653/full.md

---
Source: https://tomesphere.com/paper/2508.20653