A Stable and Effective Learning Strategy for Trainable Greedy Decoding
Yun Chen, Victor O.K. Li, Kyunghyun Cho, Samuel R. Bowman

TL;DR
This paper introduces a novel, computationally efficient learning strategy that enhances greedy decoding in neural machine translation by using a trained neural actor to emulate beam search benefits without extra cost.
Contribution
A new method employing a neural actor trained on pseudo-parallel data to improve greedy decoding, avoiding reinforcement learning and maintaining efficiency.
Findings
Significant translation quality improvements over base models.
Achieves near-beam search performance with minimal additional computation.
Effective across multiple datasets and neural architectures.
Abstract
Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The method revolves around a small neural network actor that is trained to observe and manipulate the hidden state of a previously-trained decoder. To train this actor network, we introduce the use of a pseudo-parallel corpus built using the output of beam search on a base model, ranked by a target quality metric like BLEU. Our method is inspired by earlier work on this problem, but requires no reinforcement learning, and can be trained reliably on a range of models. Experiments on three parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
