Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging
Alphaeus Dmonte, Vidhi Gupta, Daniel J Perry, Mark Arehart

TL;DR
This paper analyzes a language-specific model merging strategy for multilingual LLMs, demonstrating significant efficiency gains in training and maintenance costs while maintaining model quality across multiple tasks and datasets.
Contribution
It provides the first focused analysis of model merging efficiency, showing substantial reductions in training and maintenance costs for multilingual models.
Findings
Training time reduced by up to 50%
Maintenance costs reduced by over 60%
Effective on both academic and industry datasets
Abstract
Fine-tuning a task-specific multilingual large language model (LLM) involves training the model on a multilingual dataset with examples in all the required languages. Updating one or more supported languages with additional data or adding support for a new language involves retraining the model, which can be computationally inefficient and creates a severe maintenance bottleneck. Recent research on merging multilingual multitask models has shown promise in terms of improved quality, but its computational and maintenance efficiency remains unstudied. In this work, we provide the first focused analysis of this merging strategy from an efficiency perspective, evaluating it across three independent tasks. We demonstrate significant efficiency gains while maintaining parity in terms of quality: this merging approach reduces the initial training time by up to 50\%. We also demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
