WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning
Zhaojian Yu, Xin Zhang, Ning Shang, Yangyu Huang, Can Xu, Yishujie, Zhao, Wenxiang Hu, Qiufeng Yin

TL;DR
WaveCoder introduces a versatile instruction tuning approach with a new dataset, significantly enhancing large language models' ability to generalize across diverse complex code-related tasks.
Contribution
The paper presents WaveCoder, a series of Code LLMs trained with a novel, high-quality, multi-task instruction dataset, improving generalization in complex code tasks.
Findings
WaveCoder models outperform existing open-source models in task generalization.
WaveCoder-Ultra-6.7B achieves state-of-the-art results across multiple code tasks.
The approach enhances multi-task performance in code-related applications.
Abstract
Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks. However, current instruction tuning methods for Code LLMs mainly focus on the traditional code generation task, resulting in poor performance in complex multi-task scenarios. In this paper, we concentrate on multiple code-related tasks and present WaveCoder, a series of Code LLMs trained with Widespread And Versatile Enhanced instruction data. To enable the models to tackle complex code-related tasks, we propose a method to stably generate diverse, high-quality instruction data from open source code dataset in multi-task scenarios and obtain CodeSeaXDataset, a dataset comprising 19,915 instruction instances across 4 code-related tasks, which is aimed at improving the generalization ability of Code LLM. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/wavecoder-ds-6.7bmodel· 36 dl· ♡ 536 dl♡ 5
- 🤗microsoft/wavecoder-pro-6.7bmodel· 38 dl· ♡ 638 dl♡ 6
- 🤗microsoft/wavecoder-ultra-6.7bmodel· 145 dl· ♡ 80145 dl♡ 80
- 🤗lmstudio-community/wavecoder-ultra-6.7b-GGUFmodel· 136 dl· ♡ 11136 dl♡ 11
- 🤗Vezora/WaveCoder-6.7b-Ultra-bf16model· 4 dl4 dl
- 🤗QuantFactory/wavecoder-ds-6.7b-GGUFmodel· 69 dl· ♡ 169 dl♡ 1
- 🤗QuantFactory/wavecoder-ultra-6.7b-GGUFmodel· 25 dl· ♡ 125 dl♡ 1
- 🤗RichardErkhov/Vezora_-_WaveCoder-6.7b-Ultra-bf16-ggufmodel· 14 dl14 dl
Videos
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Parallel Computing and Optimization Techniques
MethodsFocus
