Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Inclusion AI; Biao Gong; Cheng Zou; Dandan Zheng; Hu Yu; Jingdong Chen; Jianxin Sun; Junbo Zhao; Jun Zhou; Kaixiang Ji; Lixiang Ru; Libin Wang; Qingpei Guo; Rui Liu; Weilong Chai; Xinyu Xiao; Ziyuan Huang

arXiv:2505.02471·cs.CV·June 16, 2025

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Inclusion AI, Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang

PDF

Open Access 1 Repo 1 Models

TL;DR

Ming-Lite-Uni introduces a unified multimodal framework that combines vision and language models for versatile tasks like image generation and editing, with open-source code and promising experimental results.

Contribution

The paper presents a novel unified architecture with multi-scale tokens and alignment strategies, enabling advanced multimodal capabilities beyond existing models.

Findings

01

Strong performance demonstrated in experiments

02

Fluid interactive process observed

03

Open-source implementation available

Abstract

We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale representation alignment strategy. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results demonstrate the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. All code and model weights are open-sourced to foster further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

inclusionai/ming
pytorchOfficial

Models

🤗
inclusionAI/Ming-Lite-Uni
model· ♡ 28
♡ 28

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis

MethodsDiffusion