A non-monotone trust-region method with noisy oracles and additional   sampling

Natasa Krejic; Natasa Krklec Jerinkic; Angeles Martinez; Mahsa Yousefi

arXiv:2307.10038·math.OC·January 18, 2024·Comput. Optim. Appl.

A non-monotone trust-region method with noisy oracles and additional sampling

Natasa Krejic, Natasa Krklec Jerinkic, Angeles Martinez, Mahsa Yousefi

PDF

Open Access

TL;DR

This paper presents a novel stochastic second-order trust-region method with adaptive sampling for training deep neural networks, demonstrating improved efficiency and convergence in non-convex optimization tasks.

Contribution

It introduces an adaptive sample size strategy within a non-monotone trust-region framework for noisy second-order optimization in deep learning.

Findings

01

Outperforms state-of-the-art methods in neural network training tasks.

02

Requires fewer gradient evaluations for comparable or better accuracy.

03

Achieves almost sure convergence under standard assumptions.

Abstract

In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies which yield noisy approximations of the finite sum objective function and its gradient. To effectively control the resulting approximation error, we introduce an adaptive sample size strategy based on inexpensive additional sampling. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM