AGGLIO: Global Optimization for Locally Convex Functions

Debojyoti Dey; Bhaskar Mukhoty; Purushottam Kar

arXiv:2111.03932·math.OC·November 9, 2021

AGGLIO: Global Optimization for Locally Convex Functions

Debojyoti Dey, Bhaskar Mukhoty, Purushottam Kar

PDF

1 Repo

TL;DR

AGGLIO is a novel optimization method that guarantees global convergence for non-convex problems with locally convex regions, improving training efficiency for neural networks with common activation functions.

Contribution

Introduces AGGLIO, a stage-wise graduated optimization technique with provable global convergence for locally convex non-convex objectives, including neural network training.

Findings

01

Outperforms recent optimization methods in convergence rate

02

Achieves higher convergent accuracy

03

Applicable with SGD for practical training scenarios

Abstract

This paper presents AGGLIO (Accelerated Graduated Generalized LInear-model Optimization), a stage-wise, graduated optimization technique that offers global convergence guarantees for non-convex optimization problems whose objectives offer only local convexity and may fail to be even quasi-convex at a global scale. In particular, this includes learning problems that utilize popular activation functions such as sigmoid, softplus and SiLU that yield non-convex training objectives. AGGLIO can be readily implemented using point as well as mini-batch SGD updates and offers provable convergence to the global optimum in general conditions. In experiments, AGGLIO outperformed several recently proposed optimization techniques for non-convex and locally convex objectives in terms of convergence rate as well as convergent accuracy. AGGLIO relies on a graduation technique for generalized linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

purushottamkar/agglio
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Linear Unit · Stochastic Gradient Descent