Pushing the boundaries of parallel Deep Learning -- A practical approach
Paolo Viviani, Maurizio Drocco, Marco Aldinucci

TL;DR
This paper evaluates current data parallel deep learning training methods, proposes a practical C++ library to unify these approaches, and enables researchers to explore performance improvements within familiar workflows.
Contribution
It introduces a unified, performance-conscious C++ library for parallel deep learning training, facilitating experimentation with advanced strategies.
Findings
Assessment of state-of-the-art data parallel training methods
Design of a practical, unified C++ library
Enabling exploration of performance improvements
Abstract
This work aims to assess the state of the art of data parallel deep neural network training, trying to identify potential research tracks to be exploited for performance improvement. Beside, it presents a design for a practical C++ library dedicated at implementing and unifying the current state of the art methodologies for parallel training in a performance-conscious framework, allowing the user to explore novel strategies without departing significantly from its usual work-flow.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques
