Mugi: Value Level Parallelism For Efficient LLMs
Daniel Price, Prabhu Vellaisamy, John Shen, Di Wu

TL;DR
This paper introduces Mugi, a novel architecture utilizing value level parallelism to enhance the efficiency and sustainability of large language models through optimized nonlinear approximations and workload-specific enhancements.
Contribution
The paper generalizes value level parallelism for nonlinear approximations, optimizes it for small-batch GEMMs, and designs the Mugi architecture for full LLM workloads, achieving significant performance and energy efficiency improvements.
Findings
Up to 45x throughput improvement in nonlinear softmax operations.
Up to 668x energy efficiency gains for certain operations.
Reduced operational and embodied carbon for LLMs.
Abstract
Value level parallelism (VLP) has been proposed to improve the efficiency of large-batch, low-precision general matrix multiply (GEMM) between symmetric activations and weights. In transformer based large language models (LLMs), there exist more sophisticated operations beyond activation-weight GEMM. In this paper, we explore how VLP benefits LLMs. First, we generalize VLP for nonlinear approximations, outperforming existing nonlinear approximations in end-to-end LLM accuracy, performance, and efficiency. Our VLP approximation follows a value-centric approach, where important values are assigned with greater accuracy. Second, we optimize VLP for small-batch GEMMs with asymmetric inputs efficiently, which leverages timely LLM optimizations, including weight-only quantization, key-value (KV) cache quantization, and group query attention. Finally, we design a new VLP architecture, Mugi, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Big Data and Digital Economy · Cloud Computing and Resource Management
