Energy-Efficient Hardware Acceleration of Whisper ASR on a CGLA

Takuto Ando; Yu Eto; Ayumu Takeuchi; Yasuhiko Nakashima

arXiv:2511.02269·cs.AR·November 5, 2025

Energy-Efficient Hardware Acceleration of Whisper ASR on a CGLA

Takuto Ando, Yu Eto, Ayumu Takeuchi, Yasuhiko Nakashima

PDF

Open Access

TL;DR

This paper demonstrates that implementing Whisper ASR on a CGLA accelerator significantly improves energy efficiency compared to CPUs and GPUs, highlighting its potential for sustainable edge AI applications.

Contribution

First to execute and evaluate Whisper's core kernel on a CGLA, comparing its performance with CPUs and GPUs using FPGA and ASIC projections.

Findings

01

ASIC implementation is 1.90x more energy-efficient than NVIDIA Jetson AGX Orin.

02

ASIC is 9.83x more energy-efficient than NVIDIA RTX 4090.

03

CGLA shows promise for sustainable, power-efficient ASR on edge devices.

Abstract

The rise of generative AI for tasks like Automatic Speech Recognition (ASR) has created a critical energy consumption challenge. While ASICs offer high efficiency, they lack the programmability to adapt to evolving algorithms. To address this trade-off, we implement and evaluate Whisper's core computational kernel on the IMAX, a general-purpose Coarse-Grained Linear Arrays (CGLAs) accelerator. To our knowledge, this is the first work to execute a Whisper kernel on a CGRA and compare its performance against CPUs and GPUs. Using hardware/software co-design, we evaluate our system via an FPGA prototype and project performance for a 28 nm ASIC. Our results demonstrate superior energy efficiency. The projected ASIC is 1.90x more energy-efficient than the NVIDIA Jetson AGX Orin and 9.83x more than an NVIDIA RTX 4090 for the Q8_0 model. This work positions CGLA as a promising platform for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Advanced Neural Network Applications