Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

Yilong Chen; Yanxi Xie; Zitian Gao; He Xin; Yihao Xiao; Jason Klein Liu; Haoming Luo; Yifan Luo; Zhengmao Ye; Tingwen Liu; Xin Zhao; Ran Tao; and Bryan Dai

arXiv:2604.21724·cs.CL·April 27, 2026

Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

Yilong Chen, Yanxi Xie, Zitian Gao, He Xin, Yihao Xiao, Jason Klein Liu, Haoming Luo, Yifan Luo, Zhengmao Ye, Tingwen Liu, Xin Zhao, Ran Tao, and Bryan Dai

PDF

1 Repo

TL;DR

X-GRAM introduces a memory-efficient, data-aware embedding extraction method that enhances accuracy and scalability in large token lookup tables by decoupling model capacity from compute.

Contribution

It proposes a novel frequency-aware dynamic token-injection framework with hybrid hashing and local n-gram features, improving parameter efficiency and scalability.

Findings

01

X-GRAM improves accuracy by up to 4.4 points over baseline models.

02

It reduces memory usage by 50% while maintaining performance.

03

Extensive evaluations demonstrate superior scalability at 0.73B and 1.15B scales.

Abstract

Large token-indexed lookup tables provide a compute-decoupled scaling path, but their practical gains are often limited by poor parameter efficiency and rapid memory growth. We attribute these limitations to Zipfian under-training of the long tail, heterogeneous demand across layers, and "slot collapse" that produces redundant embeddings. To address this, we propose X-GRAM, a frequency-aware dynamic token-injection framework. X-GRAM employs hybrid hashing and alias mixing to compress the tail while preserving head capacity, and refines retrieved vectors via normalized SwiGLU ShortConv to extract diverse local n-gram features. These signals are integrated into attention value streams and inter-layer residuals using depth-aware gating, effectively aligning static memory with dynamic context. This design introduces a memory-centric scaling axis that decouples model capacity from FLOPs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Longyichen/X-gram
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.