Xuanwu: Evolving General Multimodal Models into an Industrial-Grade Foundation for Content Ecosystems

Zhiqian Zhang; Xu Zhao; Xiaoqing Xu; Guangdong Liang; Weijia Wang; Xiaolei Lv; Bo Li; Jun Gao

arXiv:2603.29211·cs.AI·April 1, 2026

Xuanwu: Evolving General Multimodal Models into an Industrial-Grade Foundation for Content Ecosystems

Zhiqian Zhang, Xu Zhao, Xiaoqing Xu, Guangdong Liang, Weijia Wang, Xiaolei Lv, Bo Li, Jun Gao

PDF

1 Models

TL;DR

Xuanwu VL-2B is an industrial-grade multimodal foundation model optimized for content moderation, balancing visual perception, language alignment, and deployment costs within a 2B-parameter budget.

Contribution

The paper introduces a new multimodal model with a specialized training pipeline and data curation mechanism for industrial content ecosystems.

Findings

01

Xuanwu VL-2B outperforms existing models on multimodal benchmarks.

02

Achieves high recall in business moderation tasks.

03

Balances general capabilities with deployment efficiency.

Abstract

In recent years, multimodal large models have continued to improve on general benchmarks. However, in real-world content moderation and adversarial settings, mainstream models still suffer from degraded generalization and catastrophic forgetting because of limited fine-grained visual perception and insufficient modeling of long-tail noise. In this paper, we present Xuanwu VL-2B as a case study of how general multimodal models can be developed into an industrial-grade foundation model for content ecosystems. The model adopts a compact InternViT-300M + MLP + Qwen3 1.7B architecture, balancing fine-grained visual perception, language-semantic alignment, and deployment cost within an approximately 2B-parameter budget. To balance business specialization with the retention of general capabilities, we developed a data iteration and curation mechanism and trained the model through a progressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
hellogroup-opensource/Xuanwu
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.