DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism
Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li

TL;DR
This paper introduces Dynamic Hybrid Parallelism (DHP), a novel adaptive parallelism strategy for training Multimodal Large Language Models that improves efficiency and scalability under heterogeneous data conditions.
Contribution
The paper proposes DHP, a dynamic parallelism method that adaptively reconfigures communication groups during training, handling non-power-of-two degrees with minimal overhead.
Findings
DHP achieves up to 1.36× speedup over existing methods.
DHP maintains near-linear scaling efficiency across large-scale clusters.
DHP effectively handles data heterogeneity with high hardware utilization.
Abstract
Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous. Existing training frameworks predominantly rely on static parallelism strategies, which suffer from severe load imbalance, redundant communication, and suboptimal hardware utilization under data heterogeneity. In this work, we propose Dynamic Hybrid Parallelism (DHP), an efficient parallelism strategy that adaptively reconfigures communication groups and parallelism degrees during MLLM training. We generalize the non-power-of-two parallelism degrees and develop a polynomial-time algorithm to generate near-optimal parallelism strategies with only millisecond-level overhead per training batch. DHP is able to maintain high hardware efficiency even under extreme data variability. Experimental results demonstrate that DHP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Topic Modeling
