Transfer Learning for Finetuning Large Language Models
Tobias Strangmann, Lennart Purucker, J\"org K.H. Franke, Ivo, Rapant, Fabio Ferreira, Frank Hutter

TL;DR
This paper introduces a transfer learning approach for finetuning large language models, leveraging meta-learning to transfer knowledge from related tasks, which improves efficiency and effectiveness over traditional methods.
Contribution
It proposes a novel transfer learning method that uses meta-learning of performance and cost surrogate models, avoiding task-specific Bayesian optimization for better finetuning of large language models.
Findings
Transfer learning outperforms zero-shot and default finetuning.
Meta-learning surrogate models improves finetuning efficiency.
Method shows superior results on synthetic and real datasets.
Abstract
As the landscape of large language models expands, efficiently finetuning for specific tasks becomes increasingly crucial. At the same time, the landscape of parameter-efficient finetuning methods rapidly expands. Consequently, practitioners face a multitude of complex choices when searching for an optimal finetuning pipeline for large language models. To reduce the complexity for practitioners, we investigate transfer learning for finetuning large language models and aim to transfer knowledge about configurations from related finetuning tasks to a new task. In this work, we transfer learn finetuning by meta-learning performance and cost surrogate models for grey-box meta-optimization from a new meta-dataset. Counter-intuitively, we propose to rely only on transfer learning for new datasets. Thus, we do not use task-specific Bayesian optimization but prioritize knowledge transferred…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
