Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large   Language Models

Minghao Wu; Thuy-Trang Vu; Lizhen Qu; Gholamreza Haffari

arXiv:2406.08811·cs.CL·October 8, 2024

Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

Minghao Wu, Thuy-Trang Vu, Lizhen Qu, Gholamreza Haffari

PDF

Open Access 1 Video

TL;DR

This paper introduces Mixture-of-Skills, a reinforcement learning framework that dynamically optimizes dataset usage during fine-tuning of large language models, improving skill development and overall performance.

Contribution

The paper presents a novel, model-agnostic reinforcement learning method for automatic data balancing in LLM fine-tuning, and extends it with MoSpec for task-specific dataset utility optimization.

Findings

01

MoS significantly improves LLM performance across benchmarks.

02

Dynamic data balancing enhances skill development during fine-tuning.

03

MoSpec effectively tailors dataset utility for specific tasks.

Abstract

Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins to develop a comprehensive range of skills, such as writing, reasoning, chatting, coding, and more. Each skill has unique characteristics, and these datasets are often heterogeneous and imbalanced, making the fine-tuning process highly challenging. Balancing the development of each skill while ensuring the model maintains its overall performance requires sophisticated techniques and careful dataset curation. In this work, we propose a general, model-agnostic, reinforcement learning framework, Mixture-of-Skills (MoS), that learns to optimize data usage automatically during the fine-tuning process. This framework ensures the optimal comprehensive skill development of LLMs by dynamically adjusting the focus on different datasets based on their current learning state. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus