Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent
Matteo Pirotta, Marcello Restelli

TL;DR
This paper introduces a cost-sensitive method for automatically adapting batch size in stochastic gradient descent by optimizing the trade-off between gradient estimate accuracy and computational cost, demonstrated on classification tasks.
Contribution
It presents a novel, automated batch size adaptation technique based on optimizing a ratio involving expected improvement and sample cost, improving upon existing methods.
Findings
Empirically outperforms related batch size methods on classification tasks.
Demonstrates effective automatic batch size tuning in stochastic gradient descent.
Provides a practical approach for balancing accuracy and computational cost.
Abstract
In this paper, we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods. The choice of the batch size induces a trade-off between the accuracy of the gradient estimate and the cost in terms of samples of each update. We propose to determine the batch size by optimizing the ratio between a lower bound to a linear or quadratic Taylor approximation of the expected improvement and the number of samples used to estimate the gradient. The performance of the proposed approach is empirically compared with related methods on popular classification tasks. The work was presented at the NIPS workshop on Optimizing the Optimizers. Barcelona, Spain, 2016.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Machine Learning and ELM
