GEB-1.3B: Open Lightweight Large Language Model
Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu

TL;DR
GEB-1.3B is an open-source, lightweight large language model optimized for CPU inference, achieving competitive performance with novel training techniques and fine-tuning, suitable for efficient deployment.
Contribution
This work introduces GEB-1.3B, a resource-efficient LLM trained with innovative methods, and demonstrates its strong performance and open-source release for lightweight NLP applications.
Findings
Outperforms models like MindLLM-1.3B and TinyLLaMA-1.1B on benchmarks.
Achieves good inference times on CPUs with FP32 version.
Utilizes novel training techniques like ROPE, Group-Query-Attention, and FlashAttention-2.
Abstract
Recently developed large language models (LLMs) such as ChatGPT, Claude, and Llama have demonstrated impressive abilities, and even surpass human-level performance in several tasks. Despite their success, the resource-intensive demands of these models, requiring significant computational power for both training and inference, limit their deployment to high-performance servers. Additionally, the extensive calculation requirements of the models often lead to increased latency in response times. With the increasing need for LLMs to operate efficiently on CPUs, research about lightweight models that are optimized for CPU inference has emerged. In this work, we introduce GEB-1.3B, a lightweight LLM trained on 550 billion tokens in both Chinese and English languages. We employ novel training techniques, including ROPE, Group-Query-Attention, and FlashAttention-2, to accelerate training while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mathematics, Computing, and Information Processing · Computational Physics and Python Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · LLaMA
