Tiny language models

Ronit D. Gross; Yarden Tzach; Tal Halevi; Ella Koresh; Ido Kanter

arXiv:2507.14871·cs.CL·November 11, 2025

Tiny language models

Ronit D. Gross, Yarden Tzach, Tal Halevi, Ella Koresh, Ido Kanter

PDF

Open Access

TL;DR

This study investigates tiny language models (TLMs), demonstrating that they exhibit key features of larger models, with pre-training significantly improving performance, and introduces methods for efficient low-latency TLMs.

Contribution

The paper shows that pre-trained tiny language models retain essential NLP capabilities and introduces a soft committee approach for low-latency inference.

Findings

01

Pre-trained TLMs outperform non-pre-trained models on classification tasks.

02

Performance improves with larger pre-training datasets and token overlap.

03

Ensemble of shallow models can replicate deep TLM accuracy.

Abstract

A prominent achievement of natural language processing (NLP) is its ability to understand and generate meaningful human language. This capability relies on complex feedforward transformer block architectures pre-trained on large language models (LLMs). However, LLM pre-training is currently feasible only for a few dominant companies due to the immense computational resources required, limiting broader research participation. This creates a critical need for more accessible alternatives. In this study, we explore whether tiny language models (TLMs) exhibit the same key qualitative features of LLMs. We demonstrate that TLMs exhibit a clear performance gap between pre-trained and non-pre-trained models across classification tasks, indicating the effectiveness of pre-training, even at a tiny scale. The performance gap increases with the size of the pre-training dataset and with greater…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling