PacQ: A SIMT Microarchitecture for Efficient Dataflow in   Hyper-asymmetric GEMMs

Ruokai Yin; Yuhang Li; Priyadarshini Panda

arXiv:2502.18627·cs.AR·February 27, 2025

PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs

Ruokai Yin, Yuhang Li, Priyadarshini Panda

PDF

Open Access

TL;DR

PacQ is a specialized SIMT microarchitecture that accelerates hyper-asymmetric GEMMs involving low-precision INT weights and high-precision FP activations, significantly improving performance and energy efficiency.

Contribution

The paper introduces a novel microarchitecture, PacQ, with co-optimized dataflow, packing strategies, and a dedicated multiplier unit for efficient hyper-asymmetric GEMMs.

Findings

01

Achieves up to 1.99x speedup over conventional baselines.

02

Reduces energy-delay product (EDP) by 81.4%.

03

Effectively accelerates weight-only quantized LLM inference.

Abstract

Weight-only quantization has been widely explored in large language models (LLMs) to reduce memory storage and data loading overhead. During deployment on single-instruction-multiple-threads (SIMT) architectures, weights are stored in low-precision integer (INT) format, while activations remain in full-precision floating-point (FP) format to preserve inference accuracy. Although memory footprint and data loading requirements for weight matrices are reduced, computation performance gains remain limited due to the need to convert weights back to FP format through unpacking and dequantization before GEMM operations. In this work, we investigate methods to accelerate GEMM operations involving packed low-precision INT weights and high-precision FP activations, defining this as the hyper-asymmetric GEMM problem. Our approach co-optimizes tile-level packing and dataflow strategies for INT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Big Data and Digital Economy