Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach
Dongyue Li, Ziniu Zhang, Lu Wang, Hongyang R. Zhang

TL;DR
This paper presents a fast, gradient-based method for selecting optimal auxiliary tasks to improve language model fine-tuning, significantly reducing computation time while maintaining high accuracy.
Contribution
Introduces a novel first-order approximation algorithm for efficient subset selection in LM fine-tuning, avoiding repeated training on task subsets.
Findings
Achieves 30x speedup over traditional subset selection methods.
Maintains only 1% error in estimating true fine-tuning performance.
Improves downstream task performance by up to 3.8% over existing methods.
Abstract
We study the problem of fine-tuning a language model (LM) for a target task by optimally using the information from auxiliary tasks. This problem has broad applications in NLP, such as targeted instruction tuning and data selection in chain-of-thought fine-tuning. The key challenge of this problem is that not all auxiliary tasks are beneficial in improving the performance of the target task. Thus, selecting the right subset of auxiliary tasks is crucial. Conventional subset selection methods, such as forward and backward stepwise selection, are unsuitable for LM fine-tuning because they require repeated training on subsets of auxiliary tasks. This paper introduces a new algorithm for estimating model fine-tuning performance without requiring repeated training. Our algorithm first performs multitask training using data from all tasks to obtain a meta initialization. Then, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFlow Measurement and Analysis · Neural Networks and Applications · Reservoir Engineering and Simulation Methods
MethodsBalanced Selection
