Mashup Learning: Faster Finetuning by Remixing Past Checkpoints
Sofia Maria Lo Cicero Vaina, Artem Chumachenko, Max Ryabinin

TL;DR
Mashup Learning leverages prior training checkpoints to improve large language model fine-tuning, boosting accuracy and reducing training time across multiple benchmarks.
Contribution
Introduces a method to reuse and merge past checkpoints for better initialization, enhancing efficiency and performance in LLM fine-tuning.
Findings
Consistently improves downstream accuracy by 0.5-5 percentage points.
Reduces training steps by 41-46%.
Cuts total training time by up to 37%.
Abstract
Finetuning on domain-specific data is a well-established method for enhancing LLM performance on downstream tasks. Training on each dataset produces a new set of model weights, resulting in a multitude of checkpoints saved in-house or on open-source platforms. However, these training artifacts are rarely reused for subsequent experiments despite containing improved model abilities for potentially similar tasks. In this paper, we propose Mashup Learning, a simple method to leverage the outputs of prior training runs to enhance model adaptation to new tasks. Our procedure identifies the most relevant historical checkpoints for a target dataset, aggregates them with model merging, and uses the result as an improved initialization for training. Across 8 standard LLM benchmarks, four models, and two collections of source checkpoints, Mashup Learning consistently improves average downstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Machine Learning and Data Classification
