JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Yikang Shen; Zhen Guo; Tianle Cai; Zengyi Qin

arXiv:2404.07413·cs.CL·April 12, 2024·3 cites

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin

PDF

Open Access 5 Repos 3 Models

TL;DR

JetMoE-8B is a cost-effective large language model trained with less than $0.1 million, achieving performance comparable to larger models through an efficient sparsely-gated Mixture-of-Experts architecture, and emphasizing openness and reproducibility.

Contribution

The paper introduces JetMoE-8B, a novel cost-efficient LLM using a sparsely-gated MoE architecture trained on publicly available data, with detailed transparency to promote open research.

Findings

01

JetMoE-8B outperforms Llama2-7B.

02

JetMoE-8B-Chat surpasses Llama2-13B-Chat.

03

Training costs are significantly reduced without sacrificing performance.

Abstract

Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence. This report introduces JetMoE-8B, a new LLM trained with less than $0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, the JetMoE-8B demonstrates impressive performance, with JetMoE-8B outperforming the Llama2-7B model and JetMoE-8B-Chat surpassing the Llama2-13B-Chat model. These results suggest that LLM training can be much more cost-effective than generally thought. JetMoE-8B is based on an efficient Sparsely-gated Mixture-of-Experts (SMoE) architecture, composed of attention and feedforward experts. Both layers are sparsely activated, allowing JetMoE-8B to have 8B parameters while only activating 2B for each input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification