Large Vocabulary Size Improves Large Language Models
Sho Takase, Ryokan Ri, Shun Kiyono, Takuya Kato

TL;DR
This paper demonstrates that increasing subword vocabulary size enhances large language model performance and introduces a simple method for vocabulary adaptation in continual training, outperforming pre-defined vocabularies.
Contribution
It provides empirical evidence on the benefits of larger vocabularies and proposes a new approach for vocabulary replacement during continual training.
Findings
Larger vocabularies improve LLM performance
Replacing vocabularies in continual training yields better results
Simple vocabulary adaptation outperforms pre-defined vocabularies
Abstract
This paper empirically investigates the relationship between subword vocabulary size and the performance of large language models (LLMs) to provide insights on how to define the vocabulary size. Experimental results show that larger vocabulary sizes lead to better performance in LLMs. Moreover, we consider a continual training scenario where a pre-trained language model is trained on a different target language. We introduce a simple method to use a new vocabulary instead of the pre-defined one. We show that using the new vocabulary outperforms the model with the vocabulary used in pre-training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
