On the Variance of Unbiased Online Recurrent Optimization

Tim Cooijmans; James Martens

arXiv:1902.02405·cs.LG·February 8, 2019·6 cites

On the Variance of Unbiased Online Recurrent Optimization

Tim Cooijmans, James Martens

PDF

Open Access

TL;DR

This paper analyzes the variance in the UORO algorithm for online RNN training, proposes improvements to reduce this variance, and explores its theoretical connections to REINFORCE, enhancing understanding and performance.

Contribution

It provides a detailed variance analysis of UORO, introduces variance reduction techniques, and clarifies its relationship with REINFORCE in the context of online RNN optimization.

Findings

01

Variance of UORO's gradient estimate can be significantly reduced.

02

Proposed modifications improve the stability and efficiency of online RNN training.

03

Established a connection between UORO's gradient estimate and noisy REINFORCE estimates.

Abstract

The recently proposed Unbiased Online Recurrent Optimization algorithm (UORO, arXiv:1702.05043) uses an unbiased approximation of RTRL to achieve fully online gradient-based learning in RNNs. In this work we analyze the variance of the gradient estimate computed by UORO, and propose several possible changes to the method which reduce this variance both in theory and practice. We also contribute significantly to the theoretical and intuitive understanding of UORO (and its existing variance reduction technique), and demonstrate a fundamental connection between its gradient estimate and the one that would be computed by REINFORCE if small amounts of noise were added to the RNN's hidden units.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsUnbiased Online Recurrent Optimization