Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

Xiangyu Li; Chengyu Yin; Weijun Wang; Jianyu Wei; Ting Cao; Yunxin Liu

arXiv:2512.06443·cs.DC·April 15, 2026

Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

Xiangyu Li, Chengyu Yin, Weijun Wang, Jianyu Wei, Ting Cao, Yunxin Liu

PDF

1 Repo

TL;DR

Vec-LUT introduces a vectorized lookup paradigm for ultra-low-bit LLM inference on edge devices, significantly improving memory bandwidth utilization and performance over existing methods.

Contribution

The paper proposes the vector LUT approach, including new tensor layout and cache-aware techniques, to enhance parallel ultra-low-bit LLM inference efficiency.

Findings

01

Vec-LUT outperforms state-of-the-art baselines by up to 4.2x

02

Implemented in llama.cpp and tested on 5 edge devices with 3 LLMs

03

Reduces memory bandwidth underutilization in LUT-based inference.

Abstract

Large language models (LLMs) are increasingly deployed on edge devices. To meet strict resource constraints, real-world deployment has pushed LLM quantization from 8-bit to 4-bit, 2-bit, and now 1.58-bit. Combined with lookup table (LUT)-based inference, CPUs run these ultra-low-bit LLMs even faster than NPUs, opening new opportunities for ubiquitous on-device intelligence. However, this paper identifies that LUT-based inference underutilizes memory bandwidth during parallel inference, which is required for prefilling, test-time scaling, and other multi-token scenarios. The root cause is the scalar LUT paradigm, which performs repetitive and non-contiguous memory accesses for each token. To solve the issue, we propose vector LUT, a new lookup paradigm that constructs a unified LUT across parallel tokens, and performs a single $1 \to N$ lookup per index. To realize it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OpenBitSys/vlut.cpp
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.