Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Mateusz Malinowski, Dimitrios Vytiniotis, Grzegorz Swirszcz and, Viorica Patraucean, Joao Carreira

TL;DR
This paper introduces Skip-Sideways, a novel neural network training method for large-scale temporal video data that enables low-latency, distributed, and parallel training, improving action recognition and future frame prediction.
Contribution
It extends Sideways by incorporating skip connections for better temporal integration and supports distributed training, enhancing efficiency and performance in large-scale video modeling.
Findings
Achieves low latency training and model parallelism.
Improves accuracy on HMDB51, UCF101, Kinetics-600 datasets.
Models generate better future frames, capturing motion cues.
Abstract
How can neural networks be trained on large-volume temporal data efficiently? To compute the gradients required to update parameters, backpropagation blocks computations until the forward and backward passes are completed. For temporal signals, this introduces high latency and hinders real-time learning. It also creates a coupling between consecutive layers, which limits model parallelism and increases memory consumption. In this paper, we build upon Sideways, which avoids blocking by propagating approximate gradients forward in time, and we propose mechanisms for temporal integration of information based on different variants of skip connections. We also show how to decouple computation and delegate individual neural modules to different devices, allowing distributed and parallel training. The proposed Skip-Sideways achieves low latency training, model parallelism, and, importantly, is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
