DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization

Hyeonjun An; Sihyun Kim; Chaerim Lim; Hyunjoon Kim; Rathijit Sen; Sangmin Jung; Hyeonsoo Lee; Dongwook Kim; Takki Yu; Jinkyu Jeong; Youngsok Kim; Kwanghyun Park

arXiv:2603.25120·cs.DC·May 20, 2026

DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization

Hyeonjun An, Sihyun Kim, Chaerim Lim, Hyunjoon Kim, Rathijit Sen, Sangmin Jung, Hyeonsoo Lee, Dongwook Kim, Takki Yu, Jinkyu Jeong, Youngsok Kim, Kwanghyun Park

PDF

TL;DR

DFLOP is a data-driven framework that optimizes multimodal LLM training by profiling data characteristics and balancing workloads, significantly improving GPU utilization and training speed.

Contribution

It introduces a novel data-aware scheduling approach for multimodal LLM training, addressing computation skew caused by heterogeneous input data.

Findings

01

DFLOP achieves up to 3.6x faster training compared to existing frameworks.

02

It effectively balances workloads across stages and microbatches.

03

The framework improves GPU utilization and overall training efficiency.

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable advances by integrating text, image, and audio understanding within a unified architecture. However, existing distributed training frameworks remain fundamentally data-blind: they parallelize computation without accounting for variations in input data characteristics. This data unawareness leads to severe computation skew across stages and microbatches, where heterogeneous multimodal inputs incur different processing costs. Consequently, GPU resources are unevenly utilized, synchronization delays accumulate, and overall training efficiency degrades. To address this limitation, we present DFLOP, a data-driven framework for multimodal LLM training pipeline optimization. DFLOP continuously profiles runtime behavior to capture data-induced computation variance and employs predictive scheduling to balance workloads across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques