GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park,, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao,, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

TL;DR
GraphPipe introduces graph pipeline parallelism (GPP), a novel approach that leverages DNN topology to improve training performance and scalability by enabling concurrent execution of independent operators.
Contribution
The paper proposes GPP, a new pipeline-parallel scheme that preserves DNN topology for better concurrency, and develops GraphPipe, a system that implements GPP for scalable DNN training.
Findings
GraphPipe outperforms PipeDream and Piper by up to 1.6X in training speed.
GraphPipe reduces search time by 9-21X compared to existing systems.
GPP enables concurrent execution of independent operators, improving GPU performance.
Abstract
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · IoT and Edge/Fog Computing
MethodsPipeDream
