A non-monotone trust-region method with noisy oracles and additional sampling
Natasa Krejic, Natasa Krklec Jerinkic, Angeles Martinez, Mahsa Yousefi

TL;DR
This paper presents a novel stochastic second-order trust-region method with adaptive sampling for training deep neural networks, demonstrating improved efficiency and convergence in non-convex optimization tasks.
Contribution
It introduces an adaptive sample size strategy within a non-monotone trust-region framework for noisy second-order optimization in deep learning.
Findings
Outperforms state-of-the-art methods in neural network training tasks.
Requires fewer gradient evaluations for comparable or better accuracy.
Achieves almost sure convergence under standard assumptions.
Abstract
In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies which yield noisy approximations of the finite sum objective function and its gradient. To effectively control the resulting approximation error, we introduce an adaptive sample size strategy based on inexpensive additional sampling. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
