VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs

Keer Lu; Keshi Zhao; Zhuoran Zhang; Zheng Liang; Da Pan; Shusen Zhang; Xin Wu; Guosheng Dong; Bin Cui; Tengjiao Wang; Wentao Zhang

arXiv:2411.11266·cs.CL·May 20, 2025

VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs

Keer Lu, Keshi Zhao, Zhuoran Zhang, Zheng Liang, Da Pan, Shusen Zhang, Xin Wu, Guosheng Dong, Bin Cui, Tengjiao Wang, Wentao Zhang

PDF

Open Access 1 Repo

TL;DR

VersaTune is a data composition framework that enhances multi-domain capabilities of large language models during training by dynamically adjusting domain weights, leading to significant performance improvements and better knowledge retention.

Contribution

The paper introduces VersaTune, a novel method for data composition that improves multi-domain proficiency in LLMs by dynamically balancing domain knowledge during training.

Findings

01

35.21% improvement in multi-ability performance

02

Qwen-2.5-32B + VersaTune surpasses frontier models

03

38.77% reduction in performance degradation across domains

Abstract

As demonstrated by the proprietary Large Language Models (LLMs) such as GPT and Claude series, LLMs have the potential to achieve remarkable proficiency across a wide range of domains, including law, medicine, finance, science, code, etc., all within a single model. These capabilities are further augmented during the Supervised Fine-Tuning (SFT) phase. Despite their potential, existing work mainly focuses on domain-specific enhancements during fine-tuning, the challenge of which lies in catastrophic forgetting of knowledge across other domains. In this study, we introduce **VersaTune**, a novel data composition framework designed for enhancing LLMs' overall multi-domain capabilities during training. We begin with detecting the distribution of domain-specific knowledge within the base model, followed by the training data composition that aligns with the model's existing knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

8023looker/versatune
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Softmax · Cosine Annealing · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Linear Layer · Byte Pair Encoding · Weight Decay · Dropout