Magicoder: Empowering Code Generation with OSS-Instruct
Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, Lingming Zhang

TL;DR
Magicoder introduces open-source large language models for code, trained on synthetic instruction data enhanced with open-source code snippets, achieving competitive performance and surpassing some state-of-the-art models on coding benchmarks.
Contribution
The paper presents OSS-Instruct, a novel method for generating diverse instruction data using open-source code, significantly improving code LLM performance with models under 7B parameters.
Findings
Magicoder models outperform similar-sized state-of-the-art code models.
MagicoderS-CL-7B surpasses ChatGPT on HumanEval+ benchmark.
Open-source data enhances the realism and controllability of synthetic instructions.
Abstract
We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets to generate diverse instruction data for code. Our main motivation is to mitigate the inherent bias of the synthetic data generated by LLMs through the wealth of open-source references for the production of more realistic and controllable data. The orthogonality of OSS-Instruct and other data generation methods like Evol-Instruct further enables us to build an enhanced MagicoderS. Both Magicoder and MagicoderS substantially outperform state-of-the-art code models with similar or even larger sizes on a wide range of coding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ise-uiuc/Magicoder-CL-7Bmodel· 29 dl· ♡ 2129 dl♡ 21
- 🤗ise-uiuc/Magicoder-S-CL-7Bmodel· 222 dl· ♡ 44222 dl♡ 44
- 🤗ise-uiuc/Magicoder-DS-6.7Bmodel· 32 dl· ♡ 3832 dl♡ 38
- 🤗ise-uiuc/Magicoder-S-DS-6.7Bmodel· 4.7k dl· ♡ 2054.7k dl♡ 205
- 🤗TheBloke/Magicoder-S-DS-6.7B-GGUFmodel· 735 dl· ♡ 76735 dl♡ 76
- 🤗TheBloke/Magicoder-S-DS-6.7B-AWQmodel· 12 dl· ♡ 912 dl♡ 9
- 🤗TheBloke/Magicoder-S-DS-6.7B-GPTQmodel· 17 dl· ♡ 717 dl♡ 7
- 🤗LoneStriker/Magicoder-S-CL-7B-3.0bpw-h6-exl2-2model· 4 dl· ♡ 14 dl♡ 1
- 🤗LoneStriker/Magicoder-S-CL-7B-4.0bpw-h6-exl2-2model· 4 dl· ♡ 14 dl♡ 1
- 🤗LoneStriker/Magicoder-S-CL-7B-5.0bpw-h6-exl2-2model· 4 dl· ♡ 14 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational Physics and Python Applications
