Hybrid Approach to Parallel Stochastic Gradient Descent
Aakash Sudhirbhai Vora, Dhrumil Chetankumar Joshi, Aksh, Kantibhai Patel

TL;DR
This paper introduces a hybrid data parallelism method for stochastic gradient descent that combines synchronous and asynchronous approaches, outperforming both in training neural networks efficiently.
Contribution
It proposes a novel hybrid parallelism approach that adaptively shifts between asynchronous and synchronous training to improve efficiency.
Findings
Hybrid approach outperforms pure synchronous and asynchronous methods in training time.
Adaptive thresholding effectively balances the trade-offs of both methods.
Experimental results demonstrate improved neural network training efficiency.
Abstract
Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We propose a third approach to data parallelism which is a hybrid between synchronous and asynchronous approaches, using both approaches to train the neural network. When the threshold function is selected appropriately to gradually shift all parameter aggregation from asynchronous to synchronous, we show that in a given time period our hybrid approach outperforms both asynchronous and synchronous approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy Techniques in Biomedical and Chemical Research · Stochastic Gradient Optimization Techniques · Medical Image Segmentation Techniques
