Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale
Ayush Kaushal, Tejas Vaidhya, Arnab Kumar Mondal, Tejas Pandey, Aaryan, Bhagat, Irina Rish

TL;DR
This paper demonstrates that ternary language models (TriLMs) trained at scale can outperform traditional floating-point and quantized models of similar bit-widths, offering a promising path for more efficient large language models.
Contribution
It introduces the Spectra LLM suite, the first open collection of models across multiple bit-widths, and shows that TriLMs outperform quantized and float models at large scales, challenging existing assumptions.
Findings
TriLMs outperform QuantLMs and FloatLMs at scales over one billion parameters.
The 3.9B TriLM matches the performance of the 3.9B FloatLM despite fewer bits.
Open release of 500+ checkpoints facilitates further research.
Abstract
Rapid advancements in GPU computational power has outpaced memory capacity and bandwidth growth, creating bottlenecks in Large Language Model (LLM) inference. Post-training quantization is the leading method for addressing memory-related bottlenecks in LLM inference, but it suffers from significant performance degradation below 4-bit precision. This paper addresses these challenges by investigating the pretraining of low-bitwidth models specifically Ternary Language Models (TriLMs) as an alternative to traditional floating-point models (FloatLMs) and their post-training quantized versions (QuantLMs). We present Spectra LLM suite, the first open suite of LLMs spanning multiple bit-widths, including FloatLMs, QuantLMs, and TriLMs, ranging from 99M to 3.9B parameters trained on 300B tokens. Our comprehensive evaluation demonstrates that TriLMs offer superior scaling behavior in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
