Data optimization for large batch distributed training of deep neural   networks

Shubhankar Gahlot; Junqi Yin; Mallikarjun Shankar

arXiv:2012.09272·cs.LG·December 21, 2020

Data optimization for large batch distributed training of deep neural networks

Shubhankar Gahlot, Junqi Yin, Mallikarjun Shankar

PDF

TL;DR

This paper proposes a data optimization method using machine learning to improve large batch distributed training of deep neural networks by smoothing the loss landscape and filtering less important data points, leading to faster training and better accuracy.

Contribution

It introduces a novel data filtering approach that enhances large batch training efficiency and accuracy by implicitly smoothing the loss landscape.

Findings

01

Faster training with larger batch sizes

02

Improved model accuracy in distributed training

03

Effective data filtering reduces training time

Abstract

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and model accuracy deterioration with an increase in global batch size. Present solutions focus on improving message exchange efficiency as well as implementing techniques to tweak batch sizes and models in the training process. The loss of training accuracy typically happens because the loss function gets trapped in a local minima. We observe that the loss landscape minimization is shaped by both the model and training data and propose a data optimization approach that utilizes machine learning to implicitly smooth out the loss landscape resulting in fewer local minima. Our approach filters out data points which are less important to feature learning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.