DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism

Yifan Niu; Han Xiao; Dongyi Liu; Wei Zhou; Jia Li

arXiv:2602.21788·cs.DC·February 26, 2026

DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism

Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li

PDF

Open Access

TL;DR

This paper introduces Dynamic Hybrid Parallelism (DHP), a novel adaptive parallelism strategy for training Multimodal Large Language Models that improves efficiency and scalability under heterogeneous data conditions.

Contribution

The paper proposes DHP, a dynamic parallelism method that adaptively reconfigures communication groups during training, handling non-power-of-two degrees with minimal overhead.

Findings

01

DHP achieves up to 1.36× speedup over existing methods.

02

DHP maintains near-linear scaling efficiency across large-scale clusters.

03

DHP effectively handles data heterogeneity with high hardware utilization.

Abstract

Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous. Existing training frameworks predominantly rely on static parallelism strategies, which suffer from severe load imbalance, redundant communication, and suboptimal hardware utilization under data heterogeneity. In this work, we propose Dynamic Hybrid Parallelism (DHP), an efficient parallelism strategy that adaptively reconfigures communication groups and parallelism degrees during MLLM training. We generalize the non-power-of-two parallelism degrees and develop a polynomial-time algorithm to generate near-optimal parallelism strategies with only millisecond-level overhead per training batch. DHP is able to maintain high hardware efficiency even under extreme data variability. Experimental results demonstrate that DHP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Topic Modeling