DynaFlow: Transparent and Flexible Intra-Device Parallelism via Programmable Operator Scheduling

Yi Pan; Yile Gu; Jinbin Luo; Yibo Wu; Ziren Wang; Hongtao Zhang; Ziyi Xu; Shengkai Lin; Baris Kasikci; Stephanie Wang

arXiv:2605.21603·cs.DC·May 22, 2026

DynaFlow: Transparent and Flexible Intra-Device Parallelism via Programmable Operator Scheduling

Yi Pan, Yile Gu, Jinbin Luo, Yibo Wu, Ziren Wang, Hongtao Zhang, Ziyi Xu, Shengkai Lin, Baris Kasikci, Stephanie Wang

PDF

1 Repo

TL;DR

DynaFlow is a framework that enables transparent, flexible intra-device parallelism in ML systems, improving resource utilization and throughput without invasive code changes.

Contribution

It introduces a programmable interface and asynchronous control/data-flow management to decouple logical models from physical execution schedules.

Findings

01

Achieves up to 1.29x throughput improvement.

02

Integrates parallelism strategies into 6 ML systems with minimal code changes.

03

Maintains compatibility with CUDA Graphs and TorchInductor.

Abstract

Intra-device parallelism addresses resource under-utilization in ML inference and training by overlapping the execution of operators with different resource usage. However, its wide adoption is hindered by a fundamental conflict with the static, sequential programming model of existing frameworks. Integrating these strategies requires invasive, model-specific code overhauls, representing an intractable engineering cost. This is further amplified by the high sensitivity of strategies to execution contexts (e.g., workload, model architecture, hardware), forcing developers to implement and maintain multiple specialized solutions. To address this, we propose DynaFlow, a framework that enables the transparent and flexible integration of intra-device parallelism by decoupling the logical model definition from the physical execution schedule. DynaFlow introduces a flexible frontend with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uw-syfi/DynaFlow
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.