MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm

Wei Chen; Chaoqun Du; Feng Gu; Wei He; Qizhen Li; Zide Liu; Xuhao Pan; Chang Ren; Xudong Rao; Chenfeng Wang; Tao Wei; Chengjun Yu; Pengfei Yu; Yufei Zheng; Chunpeng Zhou; Pan Zhou; Xuhan Zhu

arXiv:2512.02895·cs.CV·December 4, 2025

MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm

Wei Chen, Chaoqun Du, Feng Gu, Wei He, Qizhen Li, Zide Liu, Xuhao Pan, Chang Ren, Xudong Rao, Chenfeng Wang, Tao Wei, Chengjun Yu, Pengfei Yu, Yufei Zheng, Chunpeng Zhou, Pan Zhou, Xuhan Zhu

PDF

Open Access

TL;DR

MindGPT-4ov introduces a comprehensive post-training paradigm for multimodal large language models, significantly improving performance, efficiency, and generalization across diverse benchmarks and applications.

Contribution

It proposes novel data generation, fine-tuning, and reinforcement learning strategies, along with infrastructure optimizations, to enhance MLLMs at low cost.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Demonstrates superior performance in domain-specific tasks.

03

Reduces training and inference costs through infrastructure improvements.

Abstract

We present MindGPT-4ov, a multimodal large language model (MLLM) that introduces a general post-training paradigm spanning data production, model training, and efficient deployment. It achieves state-of-the-art performance across multiple benchmarks at low cost, effectively enhancing the foundational capabilities of MLLMs and the generalization ability. Focusing on data construction, supervised fine-tuning strategies, and multimodal reinforcement learning methods, this work proposes three key innovations: (1) An information density-based data generation scheme, integrated with a dual-dimensional tree-structured label system, enabling automated generation of high-quality cross-domain data. (2) A collaborative curriculum supervised fine-tuning approach that balances the injection of domain-specific knowledge with the preservation of general capabilities. (3) A hybrid reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks