Multi-task Code LLMs: Data Mix or Model Merge?
Mingzhi Zhu, Boris Sobolev, Rahul Krishna, Raju Pavuluri, Stacy Patterson, Michele Merler

TL;DR
This paper compares data mixing and model merging strategies for creating efficient multi-task code LLMs, finding that model merging performs better at larger scales, while data mixing is preferable at smaller scales.
Contribution
It provides an extensive empirical comparison of data mixing versus model merging for multi-task code LLMs across different scales and introduces a weight analysis technique for understanding task effects.
Findings
Model merging achieves superior performance at larger scales.
Merged models can outperform task-specific fine-tuned models.
Data mixing is preferred at smaller scales.
Abstract
Recent research advocates deploying smaller, specialized code LLMs in agentic frameworks alongside frontier models, sparking interest in efficient strategies for multi-task learning that balance performance, constraints, and costs. We compare two approaches for creating small, multi-task code LLMs: data mixing versus model merging. We conduct extensive experiments across two model families (Qwen Coder and DeepSeek Coder) at two scales (2B and 7B parameters), fine-tuning them for code generation and code summarization tasks. Our evaluation on HumanEval, MBPP, and CodeXGlue benchmarks reveals that model merging achieves the best overall performance at larger scale across model families, retaining 96% of specialized model performance on code generation tasks while maintaining summarization capabilities. Notably, merged models can even surpass individually fine-tuned models, with our best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Multimodal Machine Learning Applications
