Super Tiny Language Models

Dylan Hillier; Leon Guertler; Cheston Tan; Palaash Agrawal; Chen; Ruirui; Bobby Cheng

arXiv:2405.14159·cs.CL·June 27, 2024·1 cites

Super Tiny Language Models

Dylan Hillier, Leon Guertler, Cheston Tan, Palaash Agrawal, Chen, Ruirui, Bobby Cheng

PDF

Open Access 1 Repo

TL;DR

This paper presents innovative techniques to develop extremely small language models that maintain high performance while drastically reducing parameter counts, aiming to make NLP models more accessible and efficient.

Contribution

It introduces new methods such as byte-level tokenization, weight tying, and efficient training strategies for creating super tiny language models with 10M to 100M parameters.

Findings

01

Reduced parameter models with maintained performance

02

Exploration of tokenizer-free and self-play training methods

03

Targeting models with 10M, 50M, and 100M parameters

Abstract

The rapid advancement of large language models (LLMs) has led to significant improvements in natural language processing but also poses challenges due to their high computational and energy demands. This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs), which aim to deliver high performance with significantly reduced parameter counts. We explore innovative techniques such as byte-level tokenization with a pooling mechanism, weight tying, and efficient training strategies. These methods aim to significantly reduce reduce the parameter count compared to traditional models -- in future works, we aim to build on these in a way that maintains and improves upon the performance of base transformer models. This series of papers will explore into various subproblems, including tokenizer-free models, self-play based training, and alternative training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leonguertler/supertinylanguagemodels
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsBalanced Selection