Finite-Sum Optimization: A New Perspective for Convergence to a Global   Solution

Lam M. Nguyen; Trang H. Tran; Marten van Dijk

arXiv:2202.03524·cs.LG·February 9, 2022

Finite-Sum Optimization: A New Perspective for Convergence to a Global Solution

Lam M. Nguyen, Trang H. Tran, Marten van Dijk

PDF

Open Access

TL;DR

This paper introduces a novel reformulation and recursive algorithmic framework for finite-sum optimization in deep neural networks, providing convergence guarantees to a global minimum under bounded assumptions.

Contribution

It presents a new perspective and algorithmic approach for convergence analysis in non-convex DNN training, with theoretical guarantees under bounded assumptions.

Findings

01

Proves convergence to an ε-global minimum using rac{1}{\u03b5^3} gradient computations.

02

Introduces a reformulation enabling a recursive optimization framework.

03

Broadens understanding of conditions for global convergence in DNN training.

Abstract

Deep neural networks (DNNs) have shown great success in many machine learning tasks. Their training is challenging since the loss surface of the network architecture is generally non-convex, or even non-smooth. How and under what assumptions is guaranteed convergence to a \textit{global} minimum possible? We propose a reformulation of the minimization problem allowing for a new recursive algorithmic framework. By using bounded style assumptions, we prove convergence to an $ε$ -(global) minimum using $\tilde{O} (1/ ε^{3})$ gradient computations. Our theoretical foundation motivates further study, implementation, and optimization of the new algorithmic framework and further investigation of its non-standard bounded style assumptions. This new direction broadens our understanding of why and under what circumstances training of a DNN converges to a global minimum.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Machine Learning and ELM