Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach

Dongyue Li; Ziniu Zhang; Lu Wang; Hongyang R. Zhang

arXiv:2409.19458·cs.CL·June 3, 2025

Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach

Dongyue Li, Ziniu Zhang, Lu Wang, Hongyang R. Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents a fast, gradient-based method for selecting optimal auxiliary tasks to improve language model fine-tuning, significantly reducing computation time while maintaining high accuracy.

Contribution

Introduces a novel first-order approximation algorithm for efficient subset selection in LM fine-tuning, avoiding repeated training on task subsets.

Findings

01

Achieves 30x speedup over traditional subset selection methods.

02

Maintains only 1% error in estimating true fine-tuning performance.

03

Improves downstream task performance by up to 3.8% over existing methods.

Abstract

We study the problem of fine-tuning a language model (LM) for a target task by optimally using the information from $n$ auxiliary tasks. This problem has broad applications in NLP, such as targeted instruction tuning and data selection in chain-of-thought fine-tuning. The key challenge of this problem is that not all auxiliary tasks are beneficial in improving the performance of the target task. Thus, selecting the right subset of auxiliary tasks is crucial. Conventional subset selection methods, such as forward and backward stepwise selection, are unsuitable for LM fine-tuning because they require repeated training on subsets of auxiliary tasks. This paper introduces a new algorithm for estimating model fine-tuning performance without requiring repeated training. Our algorithm first performs multitask training using data from all tasks to obtain a meta initialization. Then, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VirtuosoResearch/Scalable-finetuning
pytorchOfficial

Videos

Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach· underline

Taxonomy

TopicsFlow Measurement and Analysis · Neural Networks and Applications · Reservoir Engineering and Simulation Methods

MethodsBalanced Selection