# Optimization on Product Submanifolds of Convolution Kernels

**Authors:** Mete Ozay, Takayuki Okatani

arXiv: 1701.06123 · 2017-11-28

## TL;DR

This paper develops a geometry-aware optimization method for CNNs that trains on ensembles of product submanifolds of kernels, improving convergence and classification performance across multiple datasets.

## Contribution

It introduces a novel approach for training CNNs on ensembles of product submanifolds of kernels, with a geometry-aware SGD algorithm and convergence analysis.

## Key findings

- G-SGD improves training loss and convergence.
- Classification performance is boosted using ensembles of PEMs.
- Geometry-aware step size methods enhance CNN training.

## Abstract

Recent advances in optimization methods used for training convolutional neural networks (CNNs) with kernels, which are normalized according to particular constraints, have shown remarkable success. This work introduces an approach for training CNNs using ensembles of joint spaces of kernels constructed using different constraints. For this purpose, we address a problem of optimization on ensembles of products of submanifolds (PEMs) of convolution kernels. To this end, we first propose three strategies to construct ensembles of PEMs in CNNs. Next, we expound their geometric properties (metric and curvature properties) in CNNs. We make use of our theoretical results by developing a geometry-aware SGD algorithm (G-SGD) for optimization on ensembles of PEMs to train CNNs. Moreover, we analyze convergence properties of G-SGD considering geometric properties of PEMs. In the experimental analyses, we employ G-SGD to train CNNs on Cifar-10, Cifar-100 and Imagenet datasets. The results show that geometric adaptive step size computation methods of G-SGD can improve training loss and convergence properties of CNNs. Moreover, we observe that classification performance of baseline CNNs can be boosted using G-SGD on ensembles of PEMs identified by multiple constraints.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.06123/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1701.06123/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1701.06123/full.md

---
Source: https://tomesphere.com/paper/1701.06123