MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team: Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengda Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Baoxi Ji, Cunliang Kong, Qiuzuo Li

TL;DR
MiniCPM4 is a highly efficient LLM designed for end devices, leveraging innovations in architecture, training data, algorithms, and inference to deliver fast, accurate performance in small and large variants.
Contribution
The paper introduces MiniCPM4, a novel LLM architecture with sparse attention, new training datasets, and optimized training and inference algorithms for on-device deployment.
Findings
Outperforms similar-sized open-source models on benchmarks.
Achieves significant speed improvements on long sequence tasks.
Operates efficiently with 0.5B and 8B parameters.
Abstract
This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗openbmb/MiniCPM4-8Bmodel· 636 dl· ♡ 283636 dl♡ 283
- 🤗openbmb/MiniCPM4-Surveymodel· 44 dl· ♡ 1344 dl♡ 13
- 🤗JunHowie/MiniCPM4-8Bmodel· 2 dl2 dl
- 🤗QuantFactory/MiniCPM4-8B-GGUFmodel· 29 dl· ♡ 229 dl♡ 2
- 🤗openbmb/MiniCPM4.1-8Bmodel· 30k dl· ♡ 38430k dl♡ 384
- 🤗openbmb/MiniCPM4.1-8B-MLXmodel· 30 dl· ♡ 230 dl♡ 2
- 🤗openbmb/MiniCPM4.1-8B-GPTQmodel· 508 dl508 dl
- 🤗openbmb/MiniCPM4.1-8B-AutoAWQmodel· 22 dl22 dl
- 🤗openbmb/MiniCPM4.1-8B-GGUFmodel· 97 dl· ♡ 1697 dl♡ 16
- 🤗openbmb/MiniCPM4.1-8B-Eagle3model· ♡ 4♡ 4
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
