Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix Multiplication
Haoxuan Shan, Cong Guo, Chiyue Wei, Feng Cheng, Junyao Zhang, Hai "Helen" Li, Yiran Chen

TL;DR
Platinum is a novel ASIC accelerator that enhances low-bit weight matrix multiplication efficiency by adaptive path switching and reduced LUT overhead, significantly improving speed and energy efficiency for quantized neural networks.
Contribution
It introduces a path-adaptable LUT-based accelerator with offline LUT construction and adaptive execution paths for ternary and general bit-serial weights, reducing overhead and boosting performance.
Findings
Achieves up to 73.6x speedup over existing accelerators.
Reduces energy consumption by up to 32.4x.
Operates efficiently within a 0.96mm2 chip area.
Abstract
The rapid scaling of large language models demands more efficient hardware. Quantization offers a promising trade-off between efficiency and performance. With ultra-low-bit quantization, there are abundant opportunities for results reuse, and thus it can be boosted with lookup tables (LUTs) based acceleration. However, existing LUT-based methods suffer from computation and hardware overheads for LUT construction, and rely solely on bit-serial computation, which is suboptimal for ternary-weight networks. We propose Platinum, a lightweight ASIC accelerator for integer weight mixed-precision matrix multiplication (mpGEMM) using LUTs. Platinum reduces LUT construction overhead via offline-generated construction paths and supports both general bit-serial and optimized ternary-weight execution through adaptive path switching. On BitNet b1.58-3B, Platinum achieves up to 73.6x, 4.09x, and 2.15x…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Low-power high-performance VLSI design · Network Packet Processing and Optimization
