Accelerating Multilingual Language Model for Excessively Tokenized Languages
Jimin Hong, Gibbeum Lee, Jaewoong Cho

TL;DR
This paper presents a framework to accelerate multilingual language models by reducing token fragmentation in non-Roman languages through targeted fine-tuning of a new language-specific head, improving generation speed without performance loss.
Contribution
It introduces a novel approach of adding and fine-tuning a language-specific head to pre-trained multilingual models to enhance efficiency for excessively tokenized languages.
Findings
Generation speed increased by 1.7 times.
Model performance on monolingual tasks maintained.
Effective reduction in token fragmentation.
Abstract
Recent advancements in large language models (LLMs) have remarkably enhanced performances on a variety of tasks in multiple languages. However, tokenizers in LLMs trained primarily on English-centric corpora often overly fragment a text into character or Unicode-level tokens in non-Roman alphabetic languages, leading to inefficient text generation. We introduce a simple yet effective framework to accelerate text generation in such languages. Our approach involves employing a new language model head with a vocabulary set tailored to a specific target language for a pre-trained LLM. This is followed by fine-tuning the new head while incorporating a verification step to ensure the model's performance is preserved. We show that this targeted fine-tuning, while freezing other model parameters, effectively reduces token fragmentation for the target language. Our extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
MethodsSparse Evolutionary Training · Fragmentation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
