On the Variance of Unbiased Online Recurrent Optimization
Tim Cooijmans, James Martens

TL;DR
This paper analyzes the variance in the UORO algorithm for online RNN training, proposes improvements to reduce this variance, and explores its theoretical connections to REINFORCE, enhancing understanding and performance.
Contribution
It provides a detailed variance analysis of UORO, introduces variance reduction techniques, and clarifies its relationship with REINFORCE in the context of online RNN optimization.
Findings
Variance of UORO's gradient estimate can be significantly reduced.
Proposed modifications improve the stability and efficiency of online RNN training.
Established a connection between UORO's gradient estimate and noisy REINFORCE estimates.
Abstract
The recently proposed Unbiased Online Recurrent Optimization algorithm (UORO, arXiv:1702.05043) uses an unbiased approximation of RTRL to achieve fully online gradient-based learning in RNNs. In this work we analyze the variance of the gradient estimate computed by UORO, and propose several possible changes to the method which reduce this variance both in theory and practice. We also contribute significantly to the theoretical and intuitive understanding of UORO (and its existing variance reduction technique), and demonstrate a fundamental connection between its gradient estimate and the one that would be computed by REINFORCE if small amounts of noise were added to the RNN's hidden units.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsUnbiased Online Recurrent Optimization
