Optimization on Submanifolds of Convolution Kernels in CNNs
Mete Ozay, Takayuki Okatani

TL;DR
This paper introduces a geometric framework to understand kernel normalization in CNNs, analyzes its effects on optimization landscapes, and proposes a convergent SGD algorithm that improves image classification performance.
Contribution
It develops a geometric understanding of kernel normalization methods and proposes a new SGD algorithm with convergence guarantees for CNN training.
Findings
Proposed method achieves state-of-the-art results on image benchmarks.
Theoretical analysis explains how normalization affects optimization geometry.
Convergence of the new SGD algorithm is theoretically guaranteed.
Abstract
Kernel normalization methods have been employed to improve robustness of optimization methods to reparametrization of convolution kernels, covariate shift, and to accelerate training of Convolutional Neural Networks (CNNs). However, our understanding of theoretical properties of these methods has lagged behind their success in applications. We develop a geometric framework to elucidate underlying mechanisms of a diverse range of kernel normalization methods. Our framework enables us to expound and identify geometry of space of normalized kernels. We analyze and delineate how state-of-the-art kernel normalization methods affect the geometry of search spaces of the stochastic gradient descent (SGD) algorithms in CNNs. Following our theoretical results, we propose a SGD algorithm with assurance of almost sure convergence of the methods to a solution at single minimum of classification loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsConvolution · Stochastic Gradient Descent
