GDP: Generalized Device Placement for Dataflow Graphs
Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter C. Ma,, Qiumin Xu, Ming Zhong, Hanxiao Liu, Anna Goldie, Azalia Mirhoseini, James, Laudon

TL;DR
This paper introduces a scalable, transferable neural network-based method for optimizing device placement in dataflow graphs of neural networks, significantly improving performance and convergence speed over existing approaches.
Contribution
It presents a novel end-to-end approach using graph neural networks and attention mechanisms that generalize to new graphs and reduce computation costs.
Findings
Achieves 16% improvement over human experts
Attains 9.2% better results than prior methods
Converges 15 times faster than previous approaches
Abstract
Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device characteristics, finding a reasonable placement is extremely challenging even for domain experts. Most existing automated device placement approaches are impractical due to the significant amount of compute required and their inability to generalize to new, previously held-out graphs. To address both limitations, we propose an efficient end-to-end method based on a scalable sequential attention mechanism over a graph neural network that is transferable to new graphs. On a diverse set of representative deep learning models, including Inception-v3, AmoebaNet, Transformer-XL, and WaveNet, our method on average achieves 16% improvement over human experts and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques
MethodsGraph Neural Network · Linear Layer · Cosine Annealing · Mixture of Logistic Distributions · RMSProp · Residual Connection · Variational Dropout · Convolution · Average Pooling · Auxiliary Classifier
