Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA
Takuto Ando, Yu Eto, Ayumu Takeuchi, Yasuhiko Nakashima

TL;DR
This paper evaluates a flexible, non-AI-specific CGLA accelerator for LLMs, demonstrating significant energy efficiency improvements over GPUs and identifying data transfer as a key bottleneck for future design optimization.
Contribution
It provides the first comprehensive end-to-end evaluation of a CGLA accelerator for LLMs, highlighting its energy efficiency and flexibility compared to traditional GPU solutions.
Findings
Achieves up to 44.4x better Power-Delay Product than GPU
Reduces Energy-Delay Product by up to 11.5x compared to GPU
Identifies host-accelerator data transfer as a primary bottleneck
Abstract
Large Language Models (LLMs) demand substantial computational resources, resulting in high energy consumption on GPUs. To address this challenge, we focus on Coarse-Grained Reconfigurable Arrays (CGRAs) as an effective alternative that provides a trade-off between energy efficiency and programmability. This paper presents the first comprehensive, end-to-end evaluation of a non-AI-specialized Coarse-Grained Linear Array (CGLA) accelerator for the state-of-the-art Qwen LLM family. The architecture has a general-purpose, task-agnostic design, yet its flexible instruction set allows for domain-specific adaptations. This flexibility enables the architecture to achieve high efficiency for sustainable LLM inference. We assess the performance of our architecture on an FPGA prototype using the widely adopted llama.cpp framework. We then project its potential as a 28nm ASIC and compare it against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Big Data and Digital Economy
