MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM Team: Chaojun Xiao; Yuxuan Li; Xu Han; Yuzhuo Bai; Jie Cai; Haotian Chen; Wentong Chen; Xin Cong; Ganqu Cui; Ning Ding; Shengda Fan; Yewei Fang; Zixuan Fu; Wenyu Guan; Yitong Guan; Junshao Guo; Yufeng Han; Bingxiang He; Yuxiang Huang; Baoxi Ji; Cunliang Kong; Qiuzuo Li; Siyuan Li; Wenhao Li; Xin Li; Yanghao Li; Yishan Li; Zhen Li; Dan Liu; Biyuan Lin; Yankai Lin; Xiang Long; Quanyu Lu; Yaxi Lu; Peiyan Luo; Hongya Lyu; Litu Ou; Yinxu Pan; Lushi Pu; Zekai Qu; Qundong Shi; Zijun Song; Jiayuan Su; Zhou Su; Ao Sun; Xianghui Sun; Peijun Tang; Fangzheng Wang; Feng Wang; Shuo Wang; Yudong Wang; Zheng Wang; Yesai Wu; Zhenyu Xiao; Jie Xie; Zihao Xie; Xiaoyue Xu; Yukun Yan; Jiarui Yuan; Jinqian Zhang; Kaihuo Zhang; Lei Zhang; Linyue Zhang; Xueren Zhang; Yudi Zhang; Hengyu Zhao; Weilin Zhao; Weilun Zhao; Yuanqian Zhao; Zhi Zheng; Chuyue Zhou; Ge Zhou; Jie Zhou; Wei Zhou; Yanghao Zhou; Zihan Zhou; Zixuan Zhou; Zhiyuan Liu; Guoyang Zeng; Chao Jia; Dahai Li; Maosong Sun

arXiv:2506.07900·cs.CL·September 5, 2025

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM Team: Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengda Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Baoxi Ji, Cunliang Kong, Qiuzuo Li

PDF

Open Access 1 Repo 10 Models 1 Datasets

TL;DR

MiniCPM4 is a highly efficient LLM designed for end devices, leveraging innovations in architecture, training data, algorithms, and inference to deliver fast, accurate performance in small and large variants.

Contribution

The paper introduces MiniCPM4, a novel LLM architecture with sparse attention, new training datasets, and optimized training and inference algorithms for on-device deployment.

Findings

01

Outperforms similar-sized open-source models on benchmarks.

02

Achieves significant speed improvements on long sequence tasks.

03

Operates efficiently with 0.5B and 8B parameters.

Abstract

This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openbmb/minicpm
pytorchOfficial

Models

Datasets

openbmb/Ultra-FineWeb
dataset· 27k dl
27k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings