Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model
Kyunghyun Cho

TL;DR
This paper introduces a novel, parallelizable decoding strategy for conditional recurrent language models that enhances existing algorithms, demonstrated through attention-based neural machine translation.
Contribution
It proposes a new decoding method that is embarrassingly parallelizable and improves upon standard decoding algorithms without communication overhead.
Findings
Improved translation quality in En->Cz neural machine translation
Decoding strategy is fully parallelizable and efficient
Enhances existing decoding algorithms without additional communication costs
Abstract
Recent advances in conditional recurrent language modelling have mainly focused on network architectures (e.g., attention mechanism), learning algorithms (e.g., scheduled sampling and sequence-level training) and novel applications (e.g., image/video description generation, speech recognition, etc.) On the other hand, we notice that decoding algorithms/strategies have not been investigated as much, and it has become standard to use greedy or beam search. In this paper, we propose a novel decoding strategy motivated by an earlier observation that nonlinear hidden layers of a deep neural network stretch the data manifold. The proposed strategy is embarrassingly parallelizable without any communication overhead, while improving an existing decoding algorithm. We extensively evaluate it with attention-based neural machine translation on the task of En->Cz translation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
