Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA

Takuto Ando; Yu Eto; Ayumu Takeuchi; Yasuhiko Nakashima

arXiv:2512.00335·cs.AR·December 2, 2025

Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA

Takuto Ando, Yu Eto, Ayumu Takeuchi, Yasuhiko Nakashima

PDF

Open Access

TL;DR

This paper evaluates a flexible, non-AI-specific CGLA accelerator for LLMs, demonstrating significant energy efficiency improvements over GPUs and identifying data transfer as a key bottleneck for future design optimization.

Contribution

It provides the first comprehensive end-to-end evaluation of a CGLA accelerator for LLMs, highlighting its energy efficiency and flexibility compared to traditional GPU solutions.

Findings

01

Achieves up to 44.4x better Power-Delay Product than GPU

02

Reduces Energy-Delay Product by up to 11.5x compared to GPU

03

Identifies host-accelerator data transfer as a primary bottleneck

Abstract

Large Language Models (LLMs) demand substantial computational resources, resulting in high energy consumption on GPUs. To address this challenge, we focus on Coarse-Grained Reconfigurable Arrays (CGRAs) as an effective alternative that provides a trade-off between energy efficiency and programmability. This paper presents the first comprehensive, end-to-end evaluation of a non-AI-specialized Coarse-Grained Linear Array (CGLA) accelerator for the state-of-the-art Qwen LLM family. The architecture has a general-purpose, task-agnostic design, yet its flexible instruction set allows for domain-specific adaptations. This flexibility enables the architecture to achieve high efficiency for sustainable LLM inference. We assess the performance of our architecture on an FPGA prototype using the widely adopted llama.cpp framework. We then project its potential as a 28nm ASIC and compare it against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Big Data and Digital Economy