KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer

Aness Al-Qawlaq; Ajay Kumar M; Deepu John

arXiv:2407.16026·cs.AR·November 21, 2025

KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer

Aness Al-Qawlaq, Ajay Kumar M, Deepu John

PDF

TL;DR

This paper presents KWT-Tiny, a highly compressed and hardware-accelerated Transformer model for keyword spotting on RISC-V edge devices, achieving significant size reduction and speedup with minimal accuracy loss.

Contribution

The paper introduces a novel RISC-V accelerated, ultra-small Transformer model for keyword spotting, optimized for low-power embedded devices with custom instructions and quantization.

Findings

01

Model size reduced from 2.42 MB to 1.65 kB

02

Inference speed increased by 5x with custom RISC-V instructions

03

Achieved 10% accuracy loss with 369x size reduction

Abstract

This paper explores the adaptation of Transformerbased models for edge devices through the quantisation and hardware acceleration of the ARM Keyword Transformer (KWT) model on a RISC-V platform. The model was targeted to run on 64kB RAM in bare-metal C using a custom-developed edge AI library. KWT-1 was retrained to be 369 times smaller, with only a 10% loss in accuracy through reducing output classes from 35 to 2. The retraining and quantisation reduced model size from 2.42 MB to 1.65 kB. The integration of custom RISC-V instructions that accelerated GELU and SoftMax operations enabled a 5x speedup and thus ~5x power reduction in inference, with inference clock cycle counts decreasing from 26 million to 5.5 million clock cycles while incurring a small area overhead of approximately 29%. The results demonstrate a viable method for porting and accelerating Transformer-based models in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention