TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling

Nisharg Nargund; Priyesh Shukla

arXiv:2602.07374·cs.CL·March 30, 2026

TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling

Nisharg Nargund, Priyesh Shukla

PDF

1 Repo 1 Models

TL;DR

TernaryLM introduces a memory-efficient language model using native 1.5-bit ternary quantization with adaptive layer-wise scaling, enabling strong performance with significantly reduced memory footprint.

Contribution

It is the first to train a language model natively with ternary quantization from scratch, achieving memory savings and competitive performance.

Findings

01

Achieves validation perplexity of 58.42 on TinyStories.

02

Surpasses DistilBERT in downstream transfer with 82.47% F1 on MRPC.

03

Reduces memory usage by 2.4x without latency increase.

Abstract

Large language models (LLMs) achieve remarkable performance but demand substantial computational resources, limiting deployment on edge devices and resource-constrained environments. We present TernaryLM, a 132M-parameter transformer trained natively with ternary quantization {-1, 0, +1} (log2(3) ~ 1.58-bit effective precision), achieving significant memory reduction without sacrificing language modeling capability. Unlike post-training quantization approaches that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors. Our experiments demonstrate: (1) validation perplexity of 58.42 on TinyStories with a cross-seed standard deviation of +/- 0.17 PPL, confirming stable optimization; (2) strong downstream transfer with 82.47% F1 on MRPC, surpassing DistilBERT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

1nisharg/TernaryLM-Memory-Efficient-Language-Modeling
github

Models

🤗
OpenRAG128/TernaryLM
model· 78 dl· ♡ 2
78 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.