A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks
Tomas Hrycej, Bernhard Bermeitinger, Massimo Pavone, G\"otz-Henrik Wiegand, Siegfried Handschuh

TL;DR
This paper introduces a two-phase training algorithm for deep neural networks that switches between non-convex and convex optimization methods based on the loss function's convexity properties, improving convergence and accuracy.
Contribution
The paper proposes a novel two-phase optimization framework that detects convexity transitions in the loss landscape to enhance training efficiency.
Findings
The algorithm effectively detects the convexity swap point during training.
Experiments show improved convergence speed and accuracy.
The convexity structure is common in real-world tasks.
Abstract
The key task of machine learning is to minimize the loss function that measures the model fit to the training data. The numerical methods to do this efficiently depend on the properties of the loss function. The most decisive among these properties is the convexity or non-convexity of the loss function. The fact that the loss function can have, and frequently has, non-convex regions has led to a widespread commitment to non-convex methods such as Adam. However, a local minimum implies that, in some environment around it, the function is convex. In this environment, second-order minimizing methods such as the Conjugate Gradient (CG) give a guaranteed superlinear convergence. We propose a novel framework grounded in the hypothesis that loss functions in real-world tasks swap from initial non-convexity to convexity towards the optimum. This is a property we leverage to design an innovative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
