Backpropagation for long sequences: beyond memory constraints with constant overheads
Navjot Kukreja, Jan H\"uckelheim, Gerard J. Gorman

TL;DR
This paper introduces a library that enables backpropagation through long sequences with constant computational overhead by using asynchronous data transfer and optimal strategies, significantly reducing memory usage without sacrificing speed.
Contribution
The authors present a novel library that combines asynchronous data transfer with the Revolve backpropagation strategy, achieving memory reduction with constant overhead regardless of sequence length.
Findings
Memory footprint can be reduced to fit into limited hardware.
The approach maintains or improves computational speed compared to previous strategies.
Constant overhead in computation regardless of sequence length.
Abstract
Naive backpropagation through time has a memory footprint that grows linearly in the sequence length, due to the need to store each state of the forward propagation. This is a problem for large networks. Strategies have been developed to trade memory for added computations, which results in a sublinear growth of memory footprint or computation overhead. In this work, we present a library that uses asynchronous storing and prefetching to move data to and from slow and cheap stor- age. The library only stores and prefetches states as frequently as possible without delaying the computation, and uses the optimal Revolve backpropagation strategy for the computations in between. The memory footprint of the backpropagation can thus be reduced to any size (e.g. to fit into DRAM), while the computational overhead is constant in the sequence length, and only depends on the ratio between compute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Caching and Content Delivery
