Convergent Block Coordinate Descent for Training Tikhonov Regularized   Deep Neural Networks

Ziming Zhang; Matthew Brand

arXiv:1711.07354·stat.ML·November 21, 2017·25 cites

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Ziming Zhang, Matthew Brand

PDF

Open Access

TL;DR

This paper introduces a convergent block coordinate descent algorithm for training deep neural networks by transforming ReLU activations into a smooth multi-convex form, leading to better test errors on MNIST.

Contribution

It develops a novel smooth multi-convex formulation for DNN training and proves global convergence of the BCD algorithm with improved empirical performance.

Findings

01

BCD algorithm converges globally to a stationary point.

02

DNNs trained with BCD outperform SGD variants on MNIST.

03

The method ensures numerically stable convex subproblems.

Abstract

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems

Methods*Communicated@Fast*How Do I Communicate to Expedia?