TinyStories: How Small Can Language Models Be and Still Speak Coherent   English?

Ronen Eldan; Yuanzhi Li

arXiv:2305.07759·cs.CL·May 26, 2023·46 cites

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Ronen Eldan, Yuanzhi Li

PDF

Open Access 5 Repos 10 Models 5 Datasets

TL;DR

This paper introduces TinyStories, a synthetic dataset of simple stories generated by advanced models, demonstrating that small language models can produce coherent, diverse, and reasoning stories, challenging the notion that larger models are necessary for such capabilities.

Contribution

The paper presents TinyStories, a new dataset and evaluation framework showing that small, simple models can generate high-quality, coherent stories and reasoning, advancing low-resource NLP research.

Findings

01

Small models (<10M parameters) can generate coherent stories.

02

A new GPT-4-based evaluation framework assesses multiple story qualities.

03

TinyStories enables research in low-resource language modeling.

Abstract

Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention). In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Attention Dropout · Position-Wise Feed-Forward Layer