Parallelizable Stack Long Short-Term Memory
Shuoyang Ding, Philipp Koehn

TL;DR
This paper introduces a parallelization method for StackLSTM that leverages state access patterns to enable efficient GPU training, significantly improving training speed and scalability.
Contribution
It presents a novel approach to parallelize StackLSTM computations by homogenizing state access patterns, facilitating faster training on GPUs.
Findings
Almost linear scaling with batch size in parsing tasks
Significantly faster training compared to previous implementations
Effective GPU parallelization of StackLSTM operations
Abstract
Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Parallel Computing and Optimization Techniques
