Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse   Mixture-of-Experts through Instruction-Tuning

Rongsheng Wang; Haoming Chen; Ruizhe Zhou; Yaofei Duan; Kunyan Cai,; Han Ma; Jiaxi Cui; Jian Li; Patrick Cheong-Iao Pang; Yapeng Wang; Tao Tan

arXiv:2312.14557·cs.CL·January 2, 2024·1 cites

Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning

Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Yaofei Duan, Kunyan Cai,, Han Ma, Jiaxi Cui, Jian Li, Patrick Cheong-Iao Pang, Yapeng Wang, Tao Tan

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper introduces Aurora, a Chinese conversational model based on Mixtral-8x7B, enhanced through instruction fine-tuning with Chinese datasets, demonstrating improved zero-shot performance on benchmark tests.

Contribution

It is the first to apply instruction fine-tuning to a sparse Mixture-of-Experts model for Chinese conversational capabilities.

Findings

01

Aurora outperforms baseline models on C-Eval, MMLU, and CMMLU benchmarks.

02

Instruction fine-tuning significantly improves Chinese conversational abilities.

03

The approach demonstrates the effectiveness of instruction tuning on sparse expert models.

Abstract

Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WangRongsheng/Aurora
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning