LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

Zifan He; Shengyu Ye; Rui Ma; Yang Wang; Jason Cong

arXiv:2511.06174·cs.AR·March 24, 2026

LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

Zifan He, Shengyu Ye, Rui Ma, Yang Wang, Jason Cong

PDF

Open Access

TL;DR

LUT-LLM introduces a memory-based FPGA accelerator for large language models, leveraging table lookups and vector quantization to significantly improve inference speed and energy efficiency compared to GPUs.

Contribution

This work presents the first FPGA-based LLM inference method using memory-based computations with vector quantization, enabling scalable deployment of large models.

Findings

01

Achieves 1.10 to 3.29 times faster generation speed than GPUs.

02

Provides 3.05 to 6.60 times higher energy efficiency.

03

Reduces arithmetic operations by 4 times.

Abstract

The rapid development of large language models (LLM) has greatly enhanced everyday applications. While many FPGA-based accelerators, with flexibility for fine-grained data control, exhibit superior speed and energy efficiency compared to GPUs, recent GPU-specific optimizations have diminished this advantage. When limited to arithmetic-based computation, FPGAs often underperform GPUs due to their comparatively fewer computational resources. To address this challenge, we exploit a key advantage of FPGAs over GPUs: abundant distributed on-chip memory embedded among computational units. We believe that shifting LLM inference from arithmetic-based to memory-based computations through table lookups can improve the efficiency on FPGAs to compete with GPUs. However, existing methods are inefficient or unable to scale and deploy language models due to algorithm and architecture design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Big Data and Digital Economy · Advanced Neural Network Applications