Online Batch Selection for Faster Training of Neural Networks

Ilya Loshchilov; Frank Hutter

arXiv:1511.06343·cs.LG·April 26, 2016·171 cites

Online Batch Selection for Faster Training of Neural Networks

Ilya Loshchilov, Frank Hutter

PDF

Open Access 1 Repo

TL;DR

This paper explores online batch selection strategies for neural network training, demonstrating that selecting batches based on loss ranking can significantly accelerate convergence of optimizers like AdaDelta and Adam.

Contribution

It introduces a simple loss-based ranking strategy for online batch selection, improving training speed for stochastic gradient methods.

Findings

01

Batch selection speeds up training by about 5 times.

02

The proposed ranking strategy effectively controls selection pressure.

03

Results are demonstrated on the MNIST dataset.

Abstract

Deep neural networks are commonly trained using stochastic non-convex optimization procedures, which are driven by gradient information estimated on fractions (batches) of the dataset. While it is commonly accepted that batch size is an important parameter for offline tuning, the benefits of online selection of batches remain poorly understood. We investigate online batch selection strategies for two state-of-the-art methods of stochastic gradient-based optimization, AdaDelta and Adam. As the loss function to be minimized for the whole dataset is an aggregation of loss functions of individual datapoints, intuitively, datapoints with the greatest loss should be considered (selected in a batch) more frequently. However, the limitations of this intuition and the proper control of the selection pressure over time are open questions. We propose a simple strategy where all datapoints are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lasagne/Lasagne
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and Data Classification

MethodsAdaDelta · Adam