MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
Junbo Cui, Bokai Xu, Chongyi Wang, Tianyu Yu, Weiyue Sun, Yingjing Xu, Tianran Wang, Zhihui He, Wenshuo Ma, Tianchi Cai, Jiancheng Gui, Luoyuan Zhang, Xian Sun, Fuwei Huang, Moye Chen, Zhuo Lin, Hanyu Liu, Qingxin Gui, Qingzhe Han, Yuyang Wen, Huiping Liu, Rongkang Wang

TL;DR
MiniCPM-o 4.5 introduces a real-time, full-duplex multimodal interaction framework enabling simultaneous perception and response, with proactive behaviors, surpassing existing models in vision-language and speech tasks at a small scale.
Contribution
The paper presents Omni-Flow, a unified streaming framework for full-duplex multimodal interaction, enabling simultaneous perception and response in a single, time-aligned process.
Findings
MiniCPM-o 4.5 achieves state-of-the-art vision-language performance at 9B parameters.
It surpasses Qwen3-Omni-30B-A3B in omni-modal understanding.
The model performs real-time interaction on edge devices with less than 12GB RAM.
Abstract
Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗openbmb/MiniCPM-V-4.6model· 222k dl· ♡ 907222k dl♡ 907
- 🤗openbmb/MiniCPM-V-4.6-ggufmodel· 26k dl· ♡ 2726k dl♡ 27
- 🤗openbmb/MiniCPM-V-4.6-Thinkingmodel· 29k dl· ♡ 2329k dl♡ 23
- 🤗openbmb/MiniCPM-V-4.6-Thinking-ggufmodel· 12k dl· ♡ 1412k dl♡ 14
- 🤗openbmb/MiniCPM-o-4_5model· 160k dl· ♡ 1377160k dl♡ 1377
- 🤗heretic-org/MiniCPM-V-4.6-Thinking-hereticmodel· 166 dl· ♡ 2166 dl♡ 2
- 🤗openbmb/MiniCPM-V-4.6-BNBmodel· 1.3k dl· ♡ 61.3k dl♡ 6
- 🤗heretic-org/MiniCPM-V-4.6-hereticmodel· 158 dl· ♡ 2158 dl♡ 2
- 🤗ZENLLC/ZEN-MiniCPM-V-4.6model· 17 dl· ♡ 117 dl♡ 1
- 🤗openbmb/MiniCPM-V-4.6-AWQmodel· 1.9k dl· ♡ 31.9k dl♡ 3
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
