TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan, Yuanzhi Li

TL;DR
This paper introduces TinyStories, a synthetic dataset of simple stories generated by advanced models, demonstrating that small language models can produce coherent, diverse, and reasoning stories, challenging the notion that larger models are necessary for such capabilities.
Contribution
The paper presents TinyStories, a new dataset and evaluation framework showing that small, simple models can generate high-quality, coherent stories and reasoning, advancing low-resource NLP research.
Findings
Small models (<10M parameters) can generate coherent stories.
A new GPT-4-based evaluation framework assesses multiple story qualities.
TinyStories enables research in low-resource language modeling.
Abstract
Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention). In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗SauravP97/tiny-stories-19Mmodel· 17 dl· ♡ 217 dl♡ 2
- 🤗nishantup/nanogpt-pretrained-slm-tinystories-124mmodel· 487 dl· ♡ 3487 dl♡ 3
- 🤗abhilash88/tinystories-slm-gptmodel· 32 dl· ♡ 232 dl♡ 2
- 🤗StentorLabs/Stentor2-12Mmodel· 124 dl· ♡ 2124 dl♡ 2
- 🤗roneneldan/TinyStories-1Mmodel· 88k dl· ♡ 6288k dl♡ 62
- 🤗roneneldan/TinyStories-33Mmodel· 55k dl· ♡ 10855k dl♡ 108
- 🤗roneneldan/TinyStories-3Mmodel· 2.0k dl· ♡ 32.0k dl♡ 3
- 🤗RajuKandasamy/tamillama_tiny_30mmodel· 31 dl· ♡ 1531 dl♡ 15
- 🤗segestic/Tinystories-gpt-0.1-3mmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗igorktech/PicoSatirikmodel· 15 dl15 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Attention Dropout · Position-Wise Feed-Forward Layer
