Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES
Felix Stahlberg, Shankar Kumar

TL;DR
This paper introduces SCONES, a multi-label classification approach for neural machine translation that models translation ambiguity more effectively than softmax, leading to improved BLEU scores and faster inference.
Contribution
The paper proposes SCONES, a novel loss function and output layer that better handle translation ambiguity, improving translation quality and inference speed in neural machine translation.
Findings
SCONES improves BLEU scores across six translation directions.
Using smaller beam sizes with SCONES speeds up inference by 3.9x.
SCONES mitigates the beam search curse and better models translation ambiguity.
Abstract
The softmax layer in neural machine translation is designed to model the distribution over mutually exclusive tokens. Machine translation, however, is intrinsically uncertain: the same source sentence can have multiple semantically equivalent translations. Therefore, we propose to replace the softmax activation with a multi-label classification layer that can model ambiguity more effectively. We call our loss function Single-label Contrastive Objective for Non-Exclusive Sequences (SCONES). We show that the multi-label output layer can still be trained on single reference training data using the SCONES loss function. SCONES yields consistent BLEU score gains across six translation directions, particularly for medium-resource language pairs and small beam sizes. By using smaller beam sizes we can speed up inference by a factor of 3.9x and still match or improve the BLEU score obtained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax
