MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu,, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou,, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu, Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

TL;DR
MiniCPM-V is an efficient multimodal large language model designed for mobile devices, achieving GPT-4V level performance with strong accuracy, multilingual support, and high-resolution perception, enabling practical AI applications on end-side devices.
Contribution
The paper introduces MiniCPM-V, a novel efficient MLLM that can be deployed on mobile devices, demonstrating high performance and broad capabilities, advancing practical AI deployment in resource-constrained environments.
Findings
Outperforms GPT-4V-1106, Gemini Pro, and Claude 3 on benchmarks
Supports high-resolution image perception and multilingual capabilities
Enables deployment of GPT-4V level models on mobile devices
Abstract
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗openbmb/MiniCPM-o-4_5model· 22k dl· ♡ 92022k dl♡ 920
- 🤗openbmb/MiniCPM-o-4_5-ggufmodel· 16k dl· ♡ 11016k dl♡ 110
- 🤗openbmb/MiniCPM-V-2_6model· 131k dl· ♡ 1035131k dl♡ 1035
- 🤗openbmb/MiniCPM-o-2_6model· 110k dl· ♡ 1285110k dl♡ 1285
- 🤗openbmb/MiniCPM-V-2model· 78k dl· ♡ 49578k dl♡ 495
- 🤗RhapsodyAI/MiniCPM-V-Embedding-previewmodel· 39 dl· ♡ 5039 dl♡ 50
- 🤗lei-HuggingFace/MinCPM-V2_6_Level_Image_08162024model· 3 dl3 dl
- 🤗jchevallard/MiniCPM-V-2_6model· 24 dl· ♡ 124 dl♡ 1
- 🤗fredaddy/MiniCPM-v-2_6model· 2 dl2 dl
- 🤗fredaddy/MiniCPM-V-2_6-Deployablemodel· 13 dl13 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression
MethodsAdam · 1-bit Adam
