Super Tiny Language Models
Dylan Hillier, Leon Guertler, Cheston Tan, Palaash Agrawal, Chen, Ruirui, Bobby Cheng

TL;DR
This paper presents innovative techniques to develop extremely small language models that maintain high performance while drastically reducing parameter counts, aiming to make NLP models more accessible and efficient.
Contribution
It introduces new methods such as byte-level tokenization, weight tying, and efficient training strategies for creating super tiny language models with 10M to 100M parameters.
Findings
Reduced parameter models with maintained performance
Exploration of tokenizer-free and self-play training methods
Targeting models with 10M, 50M, and 100M parameters
Abstract
The rapid advancement of large language models (LLMs) has led to significant improvements in natural language processing but also poses challenges due to their high computational and energy demands. This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs), which aim to deliver high performance with significantly reduced parameter counts. We explore innovative techniques such as byte-level tokenization with a pooling mechanism, weight tying, and efficient training strategies. These methods aim to significantly reduce reduce the parameter count compared to traditional models -- in future works, we aim to build on these in a way that maintains and improves upon the performance of base transformer models. This series of papers will explore into various subproblems, including tokenizer-free models, self-play based training, and alternative training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsBalanced Selection
