A block coordinate descent optimizer for classification problems   exploiting convexity

Ravi G. Patel; Nathaniel A. Trask; Mamikon A. Gulian; Eric C. Cyr

arXiv:2006.10123·cs.LG·June 19, 2020

A block coordinate descent optimizer for classification problems exploiting convexity

Ravi G. Patel, Nathaniel A. Trask, Mamikon A. Gulian, Eric C. Cyr

PDF

TL;DR

This paper introduces a hybrid Newton/Gradient Descent method that exploits convexity in the linear layer of deep neural networks, improving training efficiency and accuracy for classification tasks.

Contribution

It presents a novel coordinate descent optimizer leveraging convexity in the linear layer, combining second-order and gradient methods for better training of deep networks.

Findings

01

Improved validation error on classification tasks

02

Qualitative differences in learned basis functions

03

Enhanced training accuracy on image benchmarks

Abstract

Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer