Adaptively Truncating Backpropagation Through Time to Control Gradient   Bias

Christopher Aicher; Nicholas J. Foti; Emily B. Fox

arXiv:1905.07473·cs.LG·July 3, 2019·21 cites

Adaptively Truncating Backpropagation Through Time to Control Gradient Bias

Christopher Aicher, Nicholas J. Foti, Emily B. Fox

PDF

Open Access 1 Repo

TL;DR

This paper introduces an adaptive truncation method for backpropagation through time in RNNs, which dynamically controls gradient bias to improve training efficiency and convergence.

Contribution

It proposes a novel adaptive TBPTT scheme that adjusts truncation length based on gradient bias, supported by theoretical analysis and practical estimation methods.

Findings

01

Adaptive TBPTT reduces computational costs compared to fixed truncation.

02

The method improves convergence rates in training RNNs.

03

Experimental results show better performance on language modeling tasks.

Abstract

Truncated backpropagation through time (TBPTT) is a popular method for learning in recurrent neural networks (RNNs) that saves computation and memory at the cost of bias by truncating backpropagation after a fixed number of lags. In practice, choosing the optimal truncation length is difficult: TBPTT will not converge if the truncation length is too small, or will converge slowly if it is too large. We propose an adaptive TBPTT scheme that converts the problem from choosing a temporal lag to one of choosing a tolerable amount of gradient bias. For many realistic RNNs, the TBPTT gradients decay geometrically in expectation for large lags; under this condition, we can control the bias by varying the truncation length adaptively. For RNNs with smooth activation functions, we prove that this bias controls the convergence rate of SGD with biased gradients for our non-convex loss. Using this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aicherc/adaptive_tbptt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Advanced Adaptive Filtering Techniques

MethodsStochastic Gradient Descent