Mosaic: Towards Efficient Training of Multimodal Models with Spatial Resource Multiplexing

Yanbo Wang; Yuxuan Wang; Chen Chen; Chunyu Xue; Yu Feng; Anbang Wu; Quan Chen; Yin Chen; Qizhen Weng

arXiv:2605.18710·cs.DC·May 19, 2026

Mosaic: Towards Efficient Training of Multimodal Models with Spatial Resource Multiplexing

Yanbo Wang, Yuxuan Wang, Chen Chen, Chunyu Xue, Yu Feng, Anbang Wu, Quan Chen, Yin Chen, Qizhen Weng

PDF

TL;DR

This paper introduces Apollo, a system that improves multimodal model training efficiency by deploying multiple modules on GPUs using spatial-temporal multiplexing, achieving up to 1.31x speedup.

Contribution

It presents a novel spatial-temporal multiplexing approach and a flexible execution engine for efficient multimodal model training.

Findings

01

Achieves up to 1.31x training speedup on popular MMs.

02

Develops a performance model to estimate execution time under different resource plans.

03

Uses heuristics to derive high-quality deployment plans efficiently.

Abstract

With the wide adoption of Multimodal Models (MMs) in real-world scenarios, it is significant to efficiently train emerging MMs that exhibit increasingly complex module architectures. For MM deployment, existing works allocate a GPU to only one MM module in a temporal-multiplexing manner; this compromises training efficiency because a single module often fails to achieve high GPU utilization. To improve GPU utilization and enable efficient MM training, we propose deploying MMs in a temporal-spatial multiplexing manner, allowing multiple MM modules to colocate on a GPU with well-controlled resource quotas. In this paper, we propose Apollo, an efficient MM training system that applies temporal-spatial multiplexing. We first develop a flexible and lightweight execution engine that supports MM training with arbitrary resource quotas, and then build a comprehensive and accurate performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.