Unbiasing Truncated Backpropagation Through Time
Corentin Tallec, Yann Ollivier

TL;DR
This paper introduces ARTBP, an unbiased variant of truncated BPTT that maintains computational efficiency while improving convergence and performance in learning recurrent models.
Contribution
The paper proposes ARTBP, a novel algorithm that corrects bias in truncated BPTT using variable truncation and compensation factors, ensuring unbiased gradient estimates.
Findings
ARTBP converges reliably on synthetic tasks with complex dependencies.
ARTBP slightly outperforms truncated BPTT on Penn Treebank language modeling.
ARTBP maintains computational benefits of truncated BPTT while providing unbiased gradients.
Abstract
Truncated Backpropagation Through Time (truncated BPTT) is a widespread method for learning recurrent computational graphs. Truncated BPTT keeps the computational benefits of Backpropagation Through Time (BPTT) while relieving the need for a complete backtrack through the whole data sequence at every step. However, truncation favors short-term dependencies: the gradient estimate of truncated BPTT is biased, so that it does not benefit from the convergence guarantees from stochastic gradient theory. We introduce Anticipated Reweighted Truncated Backpropagation (ARTBP), an algorithm that keeps the computational benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using variable truncation lengths together with carefully chosen compensation factors in the backpropagation equation. We check the viability of ARTBP on two tasks. First, a simple synthetic task where careful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
