Parallelizable Stack Long Short-Term Memory

Shuoyang Ding; Philipp Koehn

arXiv:1904.03409·cs.CL·April 9, 2019·1 cites

Parallelizable Stack Long Short-Term Memory

Shuoyang Ding, Philipp Koehn

PDF

Open Access 1 Repo

TL;DR

This paper introduces a parallelization method for StackLSTM that leverages state access patterns to enable efficient GPU training, significantly improving training speed and scalability.

Contribution

It presents a novel approach to parallelize StackLSTM computations by homogenizing state access patterns, facilitating faster training on GPUs.

Findings

01

Almost linear scaling with batch size in parsing tasks

02

Significantly faster training compared to previous implementations

03

Effective GPU parallelization of StackLSTM operations

Abstract

Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuoyangd/hoolock
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Parallel Computing and Optimization Techniques