Channel-Directed Gradients for Optimization of Convolutional Neural   Networks

Dong Lao; Peihao Zhu; Peter Wonka; Ganesh Sundaramoorthi

arXiv:2008.10766·cs.LG·August 26, 2020·1 cites

Channel-Directed Gradients for Optimization of Convolutional Neural Networks

Dong Lao, Peihao Zhu, Peter Wonka, Ganesh Sundaramoorthi

PDF

Open Access

TL;DR

This paper proposes a simple, efficient method to improve the generalization of convolutional neural networks by modifying gradients along output-channel directions, applicable with any optimizer.

Contribution

It introduces a novel gradient modification technique based on output-channel directed metrics that enhances CNN training without significant computational overhead.

Findings

01

Improved generalization error across multiple benchmarks.

02

Gradient smoothing along output channels boosts performance.

03

Method compatible with existing optimizers and architectures.

Abstract

We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic gradients, can be used in conjunction with any optimizer, and has only a linear overhead (in the number of parameters) compared to computation of the stochastic gradient. The method works by computing the gradient of the loss function with respect to output-channel directed re-weighted L2 or Sobolev metrics, which has the effect of smoothing components of the gradient across a certain direction of the parameter tensor. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental. We present the continuum theory of such gradients, its discretization, and application to deep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques