Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference
Chun-Ting Chen, HanGyeol Mun, Jian Meng, Mohamed S. Abdelfattah, Jae-sun Seo

TL;DR
This paper introduces a hybrid systolic array accelerator with optimized dataflow and quantization for efficient edge inference of large language models, significantly reducing memory access and improving performance.
Contribution
It proposes a novel HSA architecture with tailored dataflow, weight quantization, and optimized units for non-linear operations, enabling high efficiency and accuracy in edge LLM inference.
Findings
Achieves 247/117 token/sec/mm² performance on 1.3B LLM
Over 2.45x/13.5x improvement over existing methods
Maintains high energy efficiency with minimal accuracy loss
Abstract
Edge inference for large language models (LLM) offers secure, low-latency, and cost-effective inference solutions. We emphasize that an edge accelerator should achieve high area efficiency and minimize external memory access (EMA) during the memory-bound decode stage, while maintaining high energy efficiency during the compute intensive prefill stage. This paper proposes an edge LLM inference accelerator featuring a hybrid systolic array (HSA) architecture that optimizes inference efficiency in both stages. To further reduce EMA, we adopt MXINT4 weight quantization and propose an optimized dataflow tailored for HSA, ensuring negligible dequantization overhead and achieving 100% hardware utilization with minimal accuracy loss under edge DRAM bandwidth constraints. For non-linear operations, we incorporate optimized root mean square normalization (RMSNorm) and rotary position embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Robotics and Automated Systems · Advanced Data Processing Techniques
