Language Adaptation on a Tight Academic Compute Budget: Tokenizer   Swapping Works and Pure bfloat16 Is Enough

Konstantin Dobler; Gerard de Melo

arXiv:2408.15793·cs.CL·August 29, 2024

Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough

Konstantin Dobler, Gerard de Melo

PDF

Open Access 1 Repo

TL;DR

This paper explores efficient language adaptation of large language models on limited hardware, demonstrating that pure bfloat16 training and tokenizer swapping can be effective strategies, especially for well-represented languages like Arabic.

Contribution

It introduces the use of pure bfloat16 training as a faster alternative and evaluates tokenizer swapping for language adaptation under constrained compute resources.

Findings

01

Pure bfloat16 training is faster and viable for limited GPU setups.

02

Tokenizer swapping improves tokenization efficiency but has limited impact on German performance.

03

Arabic models outperform baselines, indicating effective adaptation for well-represented languages.

Abstract

We investigate continued pretraining of LLMs for language adaptation on a tight academic budget: a setting in which only a few GPUs can be used in parallel, for a heavily constrained duration. We focus on adapting Mistral-7B to German or Arabic and evaluate several techniques to improve efficiency and effectiveness in this setting. Our German models adapted on this tight compute budget underperform compared to the base Mistral-7B, while our Arabic models outperform several baselines, showing that for sufficiently well-represented languages, continued pretraining for specialization is not always helpful. Our main findings focus on training precision and tokenizer swapping. Our results show that pure bfloat16 training is a viable alternative to mixed-precision training, while being much faster when only using a few GPUs. Swapping the tokenizer for a specialized one yields more efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

konstantinjdobler/tight-budget-llm-adaptation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management

MethodsBalanced Selection · Focus