Coordinate descent on the orthogonal group for recurrent neural network   training

Estelle Massart; Vinayak Abrol

arXiv:2108.00051·cs.LG·August 3, 2021

Coordinate descent on the orthogonal group for recurrent neural network training

Estelle Massart, Vinayak Abrol

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a stochastic Riemannian coordinate descent method on the orthogonal group for training recurrent neural networks, demonstrating convergence and efficiency improvements through sparsity insights and a Gauss-Southwell rule-based variant.

Contribution

It proposes a novel Riemannian coordinate descent algorithm for RNN training on the orthogonal group, with convergence proof and a faster variant exploiting sparsity.

Findings

01

Proves convergence of the algorithm under standard assumptions.

02

Shows the Riemannian gradient has an approximately sparse structure.

03

Demonstrates the effectiveness of the method on benchmark RNN training tasks.

Abstract

We propose to use stochastic Riemannian coordinate descent on the orthogonal group for recurrent neural network training. The algorithm rotates successively two columns of the recurrent matrix, an operation that can be efficiently implemented as a multiplication by a Givens matrix. In the case when the coordinate is selected uniformly at random at each iteration, we prove the convergence of the proposed algorithm under standard assumptions on the loss function, stepsize and minibatch noise. In addition, we numerically demonstrate that the Riemannian gradient in recurrent neural network training has an approximately sparse structure. Leveraging this observation, we propose a faster variant of the proposed algorithm that relies on the Gauss-Southwell rule. Experiments on a benchmark recurrent neural network training problem are presented to demonstrate the effectiveness of the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EMassart/OrthCDforRNNs
pytorchOfficial

Videos

Coordinate Descent on the Orthogonal Group for Recurrent Neural Network Training· underline

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM