VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs
Keer Lu, Keshi Zhao, Zhuoran Zhang, Zheng Liang, Da Pan, Shusen Zhang, Xin Wu, Guosheng Dong, Bin Cui, Tengjiao Wang, Wentao Zhang

TL;DR
VersaTune is a data composition framework that enhances multi-domain capabilities of large language models during training by dynamically adjusting domain weights, leading to significant performance improvements and better knowledge retention.
Contribution
The paper introduces VersaTune, a novel method for data composition that improves multi-domain proficiency in LLMs by dynamically balancing domain knowledge during training.
Findings
35.21% improvement in multi-ability performance
Qwen-2.5-32B + VersaTune surpasses frontier models
38.77% reduction in performance degradation across domains
Abstract
As demonstrated by the proprietary Large Language Models (LLMs) such as GPT and Claude series, LLMs have the potential to achieve remarkable proficiency across a wide range of domains, including law, medicine, finance, science, code, etc., all within a single model. These capabilities are further augmented during the Supervised Fine-Tuning (SFT) phase. Despite their potential, existing work mainly focuses on domain-specific enhancements during fine-tuning, the challenge of which lies in catastrophic forgetting of knowledge across other domains. In this study, we introduce **VersaTune**, a novel data composition framework designed for enhancing LLMs' overall multi-domain capabilities during training. We begin with detecting the distribution of domain-specific knowledge within the base model, followed by the training data composition that aligns with the model's existing knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Softmax · Cosine Annealing · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Linear Layer · Byte Pair Encoding · Weight Decay · Dropout
