DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

Xinyu Yao; Daniel Bourgeois; Abhinav Jain; Yuxin Tang; Jiawen Yao; Zhimin Ding; Arlei Silva; Chris Jermaine

arXiv:2505.23131·cs.LG·May 30, 2025

DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

Xinyu Yao, Daniel Bourgeois, Abhinav Jain, Yuxin Tang, Jiawen Yao, Zhimin Ding, Arlei Silva, Chris Jermaine

PDF

Open Access 3 Reviews

TL;DR

DOPPLER introduces a dual-policy learning framework for device assignment in dataflow graphs, significantly improving execution time and training efficiency for complex machine learning workloads.

Contribution

It presents a novel three-stage dual-policy framework that incorporates system-aware heuristics, addressing limitations of prior reinforcement learning approaches.

Findings

01

Outperforms baseline methods in reducing execution time.

02

Demonstrates higher sampling efficiency and faster training.

03

Effective in complex machine learning workloads.

Abstract

We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose \textsc{Doppler}, a three-stage framework for training dual-policy networks consisting of 1) a $SEL$ policy for selecting operations and 2) a $PLC$ policy for placing chosen operations on devices. Our experiments show that \textsc{Doppler}…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

The problem has a clear motivation, illustrated by a dataflow graph, in the introduction section for matrix multiplication in deep learning operations.

Weaknesses

#1 It is not easy for readers to follow the logic of this paper. The paper fails to clearly describe the problem and methods. #2 The abstract is too short to demonstrate the significance of DOPPLER. Basically, there are no quantitative results in abstract #3 “Simulator” for system execution was used through the paper without any description. Please clarify. #4 Since section 3 was titled as problem definition, is section 2 just a background? More specifically, is Algorithm 1 an existing algor

Reviewer 02Rating 8Confidence 2

Strengths

- Very significant practical impact in performance gains over best baselines towards the dataflow graph assignment problem. - Well clever design choices to enable both scalable training and improved performance - Pretraining techniques to improve performance during the actual RL phase. - Clever message passing implementation to reduce training time with very negligible performance impact - Strong generalization across hardware architectures

Weaknesses

- Evaluation seems limited to relatively small graphs (~200 nodes). Real life workloads could be much bigger that could affect the linear scalability claim. In my own personal experience, scheduling larger graphs sometimes introduces additional complexity that sometimes is not captured on smaller graphs - Dynamic Dataflow - I could imagine that dataflow may change dynamically during execution - especially with larger computational graphs. Is this a setting your algorithm can handle because from

Reviewer 03Rating 4Confidence 3

Strengths

### **Strengths** 1. **Important Problem:** The paper correctly identifies a key inefficiency in modern ML systems: bulk-synchronous execution (e.g., all-reduce) leads to device idle time [cite: 69-72]. It convincingly argues that asynchronous, work-conserving (WC) systems offer significant speedups (Table 1) [cite: 161-163], but that optimizing for them is a more complex temporal problem [cite: 169-171]. 2. **Comprehensive Training Framework:** The three-stage training paradigm (Stage I: Im

Weaknesses

### **Weaknesses and Questions** 1. **Weak Motivation for Dual Policy:** The paper's central contribution is the "dual-policy" (SEL+PLC) architecture [cite: 61, 174-176, 182]. However, the *reason* for factoring the policy this way is poorly justified. The paper never compares this two-policy agent against the most obvious and standard baseline: a *single* policy that directly outputs a `(node, device)` pair. The ablation in Table 3 [cite: 646-649] only compares DOPPLER to variants where one o

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Neural Networks and Reservoir Computing