Parallel Training of GRU Networks with a Multi-Grid Solver for Long   Sequences

Gordon Euhyun Moon; Eric C. Cyr

arXiv:2203.04738·cs.CV·March 10, 2022

Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences

Gordon Euhyun Moon, Eric C. Cyr

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel parallel training method for GRU networks using a multigrid solver, significantly speeding up training on long sequences by hierarchical correction of hidden states.

Contribution

The paper presents a new parallel-in-time training scheme for GRUs based on multigrid reduction, enabling efficient training of very long sequences.

Findings

01

Achieves up to 6.5x speedup over serial training.

02

Performance improves with increasing sequence length.

03

Effective hierarchical correction accelerates end-to-end communication.

Abstract

Parallelizing Gated Recurrent Unit (GRU) networks is a challenging task, as the training procedure of GRU is inherently sequential. Prior efforts to parallelize GRU have largely focused on conventional parallelization strategies such as data-parallel and model-parallel training algorithms. However, when the given sequences are very long, existing approaches are still inevitably performance limited in terms of training time. In this paper, we present a novel parallel training scheme (called parallel-in-time) for GRU based on a multigrid reduction in time (MGRIT) solver. MGRIT partitions a sequence into multiple shorter sub-sequences and trains the sub-sequences on different processors in parallel. The key to achieving speedup is a hierarchical correction of the hidden state to accelerate end-to-end communication in both the forward and backward propagation phases of gradient descent.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Multilevel-NN/torchbraid
pytorchOfficial

Videos

Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis

MethodsGated Recurrent Unit