RNN Training along Locally Optimal Trajectories via Frank-Wolfe   Algorithm

Yun Yue; Ming Li; Venkatesh Saligrama; Ziming Zhang

arXiv:2010.05397·cs.LG·October 16, 2020

RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Yun Yue, Ming Li, Venkatesh Saligrama, Ziming Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel RNN training method using the Frank-Wolfe algorithm, achieving lower training costs and improved performance on benchmarks, especially with long-term dependencies and noisy data.

Contribution

It develops a new RNN training approach based on Frank-Wolfe, providing theoretical convergence guarantees and demonstrating empirical advantages over traditional back-propagation.

Findings

01

Lower overall training cost compared to back-propagation

02

Significant performance improvements on benchmark datasets

03

Effective training of deep RNN architectures and robustness to noise

Abstract

We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region, and leverage this directional vector for the update, in an outer-loop. We propose to utilize the Frank-Wolfe (FW) algorithm in this context. Although, FW implicitly involves normalized gradients, which can lead to a slow convergence rate, we develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation. Our method leads to a new Frank-Wolfe method, that is in essence an SGD algorithm with a restart scheme. We prove that under certain conditions our algorithm has a sublinear convergence rate of $O (1/ ϵ)$ for $ϵ$ error. We then conduct empirical experiments on several benchmark datasets including those that exhibit long-term…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YunYunY/FW_RNN_optimizer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent