AXELRAM: Quantize Once, Never Dequantize

Yasushi Nishida

arXiv:2604.02638·cs.LG·April 6, 2026

AXELRAM: Quantize Once, Never Dequantize

Yasushi Nishida

PDF

1 Repo

TL;DR

AXELRAM introduces a fixed codebook architecture for quantized attention computation that significantly reduces multiplications and addresses stability issues with a novel sign pattern selection method.

Contribution

The paper presents a novel SRAM macro architecture enabling direct attention score computation from quantized cache indices without dequantization, improving efficiency and stability.

Findings

01

Reduces per-query multiplications by 102.4x

02

Identifies sign pattern sensitivity causing PPL spikes in some models

03

Proposes a gradient-free sign pattern selection method that eliminates catastrophic spikes

Abstract

We propose AXELRAM, a smart SRAM macro architecture that computes attention scores directly from quantized KV cache indices without dequantization. The key enabler is a design-time fixed codebook: orthogonal-transform-based quantization concentrates each coordinate's distribution to N(0,1/d), so the optimal quantizer depends only on dimension d and bit-width b, not on input data. The asymmetric path design -- transform on write, table-lookup on read with no inverse transform -- reduces per-query multiplications by 102.4x (a mathematical identity). Through multi-seed evaluation (10 seeds x 3 models), we discover that sign pattern sensitivity causes catastrophic PPL spikes (Delta > 50) on certain models (Qwen2.5-3B), while others (LLaMA-3.1-8B) are fully stable. This phenomenon extends SpinQuant's observation of rotation variance in weight quantization to the KV cache domain, where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Axelidea/AXELRAM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.