Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference
Robin Geens, Joran Heldens, Joren Dumoulin, Marian Verhelst

TL;DR
This paper systematically explores and formalizes the design space of lookup table-based accelerators for 1.58-bit LLM inference, providing a hardware generator and cost model for efficient design and evaluation.
Contribution
It introduces a formal framework and open-source generator for LUT-based accelerators, enabling comprehensive exploration and fair comparison of architectural choices.
Findings
Optimal architecture depends on activation data type.
Maximizing core size improves area density.
Optimized designs reduce area by 2.2x over baselines.
Abstract
Ternary weight quantization (e.g., BitNet b1.58) offers a promising path to mitigate the memory bandwidth bottleneck in Large Language Model (LLM) inference. However, conventional compute platforms lack native support for ternary-weight arithmetic, often relying on inefficient dequantization. Lookup table (LUT)-based hardware architectures provide an effective alternative by replacing multiplications with conditional additions, but their design space remains largely unexplored. Existing designs rely on heuristic parameter selection, lacking a systematic understanding of the architectural trade-offs. This work addresses this gap by formalizing the design space of ternary LUT-based accelerators and presenting an open-source hardware generator coupled with an analytical cost model, validated against synthesis in TSMC 16nm technology. By spanning the full architectural space, this framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
