Research on LLM Acceleration Using the High-Performance RISC-V Processor "Xiangshan" (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product)
Xu-Hao Chen, Si-Peng Hu, Hong-Chao Liu, Bo-Ran Liu, Dan Tang, Di Zhao

TL;DR
This paper presents a specialized RISC-V processor extension with vector dot product instructions, significantly accelerating large language model inference for edge AI while maintaining low power and hardware overhead.
Contribution
It introduces a novel vector dot product instruction set extension for RISC-V, implemented on the Xiangshan Nanhu processor, to enhance LLM inference efficiency on edge devices.
Findings
FPGA tests show over 4x speedup in vector dot product calculations.
GPT-2 inference speed improved by ~30% with minimal hardware overhead.
Low power consumption maintained despite performance gains.
Abstract
Considering the high-performance and low-power requirements of edge AI, this study designs a specialized instruction set processor for edge AI based on the RISC-V instruction set architecture, addressing practical issues in digital signal processing for edge devices. This design enhances the execution efficiency of edge AI and reduces its energy consumption with limited hardware overhead, meeting the demands for efficient large language model (LLM) inference computation in edge AI applications. The main contributions of this paper are as follows: For the characteristics of large language models, custom instructions were extended based on the RISC-V instruction set to perform vector dot product calculations, accelerating the computation of large language models on dedicated vector dot product acceleration hardware. Based on the open-source high-performance RISC-V processor core XiangShan…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Algorithms and Applications · Optical Systems and Laser Technology · Real-time simulation and control systems
