mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi, Qian, Ji Zhang, Fei Huang, Jingren Zhou

TL;DR
mPLUG-Owl2 is a versatile multi-modal large language model that leverages modality collaboration through a modular network design, achieving state-of-the-art performance in both text and multi-modal tasks.
Contribution
It introduces a novel modular architecture with shared and modality-adaptive modules, pioneering the use of modality collaboration in both pure-text and multi-modal scenarios.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Demonstrates effective modality collaboration in both text and multi-modal tasks.
First model to show modality collaboration phenomenon across different scenarios.
Abstract
Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks. mPLUG-Owl2 utilizes a modularized network design, with the language decoder acting as a universal interface for managing different modalities. Specifically, mPLUG-Owl2 incorporates shared functional modules to facilitate modality collaboration and introduces a modality-adaptive module that preserves modality-specific features. Extensive experiments reveal that mPLUG-Owl2 is capable of generalizing both text tasks and multi-modal tasks and achieving state-of-the-art performances with a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsFocus
