Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference

Robin Geens; Joran Heldens; Joren Dumoulin; Marian Verhelst

arXiv:2604.25183·cs.AR·April 29, 2026

Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference

Robin Geens, Joran Heldens, Joren Dumoulin, Marian Verhelst

PDF

TL;DR

This paper systematically explores and formalizes the design space of lookup table-based accelerators for 1.58-bit LLM inference, providing a hardware generator and cost model for efficient design and evaluation.

Contribution

It introduces a formal framework and open-source generator for LUT-based accelerators, enabling comprehensive exploration and fair comparison of architectural choices.

Findings

01

Optimal architecture depends on activation data type.

02

Maximizing core size improves area density.

03

Optimized designs reduce area by 2.2x over baselines.

Abstract

Ternary weight quantization (e.g., BitNet b1.58) offers a promising path to mitigate the memory bandwidth bottleneck in Large Language Model (LLM) inference. However, conventional compute platforms lack native support for ternary-weight arithmetic, often relying on inefficient dequantization. Lookup table (LUT)-based hardware architectures provide an effective alternative by replacing multiplications with conditional additions, but their design space remains largely unexplored. Existing designs rely on heuristic parameter selection, lacking a systematic understanding of the architectural trade-offs. This work addresses this gap by formalizing the design space of ternary LUT-based accelerators and presenting an open-source hardware generator coupled with an analytical cost model, validated against synthesis in TSMC 16nm technology. By spanning the full architectural space, this framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.