Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox   to Speed up Deep Neural Network Training

Martin Kuehn; Janis Keuper; Franz-Josef Pfreundt

arXiv:1706.00095·cs.LG·August 21, 2017·2 cites

Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

Martin Kuehn, Janis Keuper, Franz-Josef Pfreundt

PDF

Open Access

TL;DR

This paper presents CaffeGPI, an extension of the Caffe deep learning framework utilizing GPI-2 for efficient distributed memory parallelization, achieving better scalability and faster training of deep neural networks on HPC hardware.

Contribution

The paper introduces CaffeGPI, a novel parallel version of Caffe that leverages GPI-2 for asynchronous communication, improving scalability and training speed in distributed environments.

Findings

01

CaffeGPI scales better than other extensions like Intel TM Caffe.

02

Within a single machine with 4 GPUs, CaffeGPI outperforms standard Caffe.

03

Initial benchmarks show improved scalability and training efficiency.

Abstract

Deep Neural Network (DNN) are currently of great inter- est in research and application. The training of these net- works is a compute intensive and time consuming task. To reduce training times to a bearable amount at reasonable cost we extend the popular Caffe toolbox for DNN with an efficient distributed memory communication pattern. To achieve good scalability we emphasize the overlap of computation and communication and prefer fine granu- lar synchronization patterns over global barriers. To im- plement these communication patterns we rely on the the Global address space Programming Interface version 2 (GPI-2) communication library. This interface provides a light-weight set of asynchronous one-sided communica- tion primitives supplemented by non-blocking fine gran- ular data synchronization mechanisms. Therefore, Caf- feGPI is the name of our parallel version of Caffe. First…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Neural Networks and Applications