TL;DR
LOCALUT leverages capacity-computation tradeoffs in LUT-based PIM architectures to enhance low-bit quantized DNN inference efficiency, achieving significant speedups by optimizing LUT size and execution strategies.
Contribution
The paper introduces LOCALUT, a novel LUT-based PIM design that reduces LUT size and improves performance through canonicalization, reordering, and streaming techniques.
Findings
Achieved a 1.82x geometric mean speedup on real hardware.
Demonstrated effectiveness across various numeric precisions and DNN models.
Proposed techniques reduce LUT redundancy and improve data reuse.
Abstract
Lookup tables (LUTs) have recently gained attention as an alternative compute mechanism that maps input operands to precomputed results, eliminating the need for arithmetic logic. LUTs not only reduce logic complexity, but also naturally support diverse numerical precisions without requiring separate circuits for each bitwidth-an increasingly important feature in quantized DNNs. This creates a favorable tradeoff in PIM: memory capacity can be used in place of logic to increase computational throughput, aligning well with DRAM-PIM architectures that offer high bandwidth and easily available memory but limited logic density. In this work, we explore this capacity-computation tradeoff in LUT-based PIM designs, where memory capacity is traded for performance by packing multiple MAC operations into a single LUT lookup. Building on this insight, we propose LOCALUT, a PIM-based design for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
