Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

Weiqi Feng; Yangrui Chen; Shaoyu Wang; Yanghua Peng; Haibin Lin; Minlan Yu

arXiv:2408.03505·cs.CL·June 3, 2025

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

Weiqi Feng, Yangrui Chen, Shaoyu Wang, Yanghua Peng, Haibin Lin, Minlan Yu

PDF

Open Access

TL;DR

Optimus is a system that significantly speeds up large-scale multimodal LLM training by optimizing bubble scheduling and parallel plans, reducing GPU idle time and training duration.

Contribution

It introduces a novel bubble scheduling algorithm and parallel plan search to minimize GPU bubbles during MLLM training, improving efficiency.

Findings

01

Achieves 20.5%-21.3% faster training on ViT-22B and GPT-175B models.

02

Effectively reduces GPU idle time caused by heterogeneous data dependencies.

03

Demonstrates scalability on 3072 GPUs in a production environment.

Abstract

Multimodal large language models (MLLMs) have extended the success of large language models (LLMs) to multiple data types, such as image, text and audio, achieving significant performance in various domains, including multimodal translation, visual question answering and content generation. Nonetheless, existing systems are inefficient to train MLLMs due to substantial GPU bubbles caused by the heterogeneous modality models and complex data dependencies in 3D parallelism. This paper proposes Optimus, a distributed MLLM training system that reduces end-to-end MLLM training time. Optimus is based on our principled analysis that scheduling the encoder computation within the LLM bubbles can reduce bubbles in MLLM training. To make scheduling encoder computation possible for all GPUs, Optimus searches the separate parallel plans for encoder and LLM, and adopts a bubble scheduling algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Soft Robotics and Applications · Innovative Microfluidic and Catalytic Techniques Innovation