EmbBERT: Attention Under 2 MB Memory

Riccardo Bravin; Massimo Pavan; Hazem Hesham Yousef Shalby; Fabrizio Pittorino; Manuel Roveri

arXiv:2502.10001·cs.CL·March 25, 2026

EmbBERT: Attention Under 2 MB Memory

Riccardo Bravin, Massimo Pavan, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri

PDF

TL;DR

EmbBERT is a highly efficient transformer model designed for ultra-constrained devices, achieving competitive NLP performance within just 2 MB of memory through architectural simplifications and quantization.

Contribution

This paper introduces EmbBERT, a novel tiny language model optimized for extreme memory constraints, demonstrating effective NLP performance with only 2 MB of memory.

Findings

01

EmbBERT requires only 2 MB of memory and matches state-of-the-art accuracy.

02

EmbBERT outperforms similar-sized downsized models like BERT and MAMBA.

03

Quantization reduces memory usage to 781 kB without significant performance loss.

Abstract

Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task. However, their substantial memory and computational requirements still hinder deployment on ultra-constrained devices such as wearables and Internet-of-Things (IoT) units, where available memory is limited to just a few megabytes. To address this challenge, we introduce EmbBERT, a tiny language model (TLM) architecturally designed for extreme efficiency. The model integrates a compact embedding layer, streamlined feed-forward blocks, and an efficient attention mechanism that together enable optimal performance under strict memory budgets. Through this redesign for the extreme edge, we demonstrate that highly simplified transformer architectures remain remarkably effective under tight resource constraints. EmbBERT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Adam · Softmax · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · WordPiece · Layer Normalization · Residual Connection · Linear Layer