Accelerating Multilingual Language Model for Excessively Tokenized   Languages

Jimin Hong; Gibbeum Lee; Jaewoong Cho

arXiv:2401.10660·cs.CL·August 7, 2024·2 cites

Accelerating Multilingual Language Model for Excessively Tokenized Languages

Jimin Hong, Gibbeum Lee, Jaewoong Cho

PDF

Open Access 1 Video

TL;DR

This paper presents a framework to accelerate multilingual language models by reducing token fragmentation in non-Roman languages through targeted fine-tuning of a new language-specific head, improving generation speed without performance loss.

Contribution

It introduces a novel approach of adding and fine-tuning a language-specific head to pre-trained multilingual models to enhance efficiency for excessively tokenized languages.

Findings

01

Generation speed increased by 1.7 times.

02

Model performance on monolingual tasks maintained.

03

Effective reduction in token fragmentation.

Abstract

Recent advancements in large language models (LLMs) have remarkably enhanced performances on a variety of tasks in multiple languages. However, tokenizers in LLMs trained primarily on English-centric corpora often overly fragment a text into character or Unicode-level tokens in non-Roman alphabetic languages, leading to inefficient text generation. We introduce a simple yet effective framework to accelerate text generation in such languages. Our approach involves employing a new language model head with a vocabulary set tailored to a specific target language for a pre-trained LLM. This is followed by fine-tuning the new head while incorporating a verification step to ensure the model's performance is preserved. We show that this targeted fine-tuning, while freezing other model parameters, effectively reduces token fragmentation for the target language. Our extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Accelerating Multilingual Language Model for Excessively Tokenized Languages· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Handwritten Text Recognition Techniques

MethodsSparse Evolutionary Training · Fragmentation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings