Smoothing and Shrinking the Sparse Seq2Seq Search Space
Ben Peters, Andr\'e F. T. Martins

TL;DR
This paper introduces entmax-based sparse sequence models that effectively address the length bias in seq2seq models, improving translation and linguistic task performance by shrinking the search space and generalizing label smoothing.
Contribution
It demonstrates the effectiveness of entmax models in solving the length bias problem and generalizes label smoothing to Fenchel-Young losses, achieving state-of-the-art results.
Findings
Entmax models remove the 'cat got your tongue' problem in translation.
Label smoothing is generalized to Fenchel-Young losses.
Models achieve state-of-the-art on multilingual grapheme-to-phoneme conversion.
Abstract
Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is its length bias: models give high scores to short, inadequate hypotheses and often make the empty string the argmax -- the so-called cat got your tongue problem. Recently proposed entmax-based sparse sequence-to-sequence models present a possible solution, since they can shrink the search space by assigning zero probability to bad hypotheses, but their ability to handle word-level tasks with transformers has never been tested. In this work, we show that entmax-based models effectively solve the cat got your tongue problem, removing a major source of model error for neural machine translation. In addition, we generalize label smoothing, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsSoftmax
