TL;DR
DynaFlow is a framework that enables transparent, flexible intra-device parallelism in ML systems, improving resource utilization and throughput without invasive code changes.
Contribution
It introduces a programmable interface and asynchronous control/data-flow management to decouple logical models from physical execution schedules.
Findings
Achieves up to 1.29x throughput improvement.
Integrates parallelism strategies into 6 ML systems with minimal code changes.
Maintains compatibility with CUDA Graphs and TorchInductor.
Abstract
Intra-device parallelism addresses resource under-utilization in ML inference and training by overlapping the execution of operators with different resource usage. However, its wide adoption is hindered by a fundamental conflict with the static, sequential programming model of existing frameworks. Integrating these strategies requires invasive, model-specific code overhauls, representing an intractable engineering cost. This is further amplified by the high sensitivity of strategies to execution contexts (e.g., workload, model architecture, hardware), forcing developers to implement and maintain multiple specialized solutions. To address this, we propose DynaFlow, a framework that enables the transparent and flexible integration of intra-device parallelism by decoupling the logical model definition from the physical execution schedule. DynaFlow introduces a flexible frontend with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
