DeepPCR: Parallelizing Sequential Operations in Neural Networks
Federico Danieli, Miguel Sarabia, Xavier Suau, Pau Rodr\'iguez, Luca, Zappella

TL;DR
DeepPCR introduces a parallel algorithm that significantly accelerates neural network inference and training by reducing the complexity of sequential operations from linear to logarithmic, enabling substantial speedups in various models.
Contribution
The paper presents DeepPCR, a novel parallelization method that interprets sequential neural operations as systems of equations, reducing computational complexity and enabling faster neural network training and inference.
Findings
Up to 30x speedup in forward pass
Up to 200x speedup in backward pass
7x faster training and 11x faster generation in diffusion models
Abstract
Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes are executed layer-by-layer, and the output of diffusion models is produced by applying a sequence of denoising steps. This sequential approach results in a computational cost proportional to the number of steps involved, presenting a potential bottleneck as the number of steps increases. In this work, we introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations in order to speed up inference and training of neural networks. DeepPCR is based on interpreting a sequence of steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm. This reduces the complexity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Machine Learning in Materials Science
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
