Semantic Label Smoothing for Sequence to Sequence Problems
Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim,, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

TL;DR
This paper introduces a semantic label smoothing technique for sequence-to-sequence tasks like machine translation, which improves regularization by smoothing over relevant, semantically similar sequences rather than all possible outputs.
Contribution
The paper proposes a novel semantic label smoothing method that considers semantically similar sequences, addressing limitations of existing token-level or random smoothing approaches in seq2seq models.
Findings
Consistent improvement over state-of-the-art methods
Effective regularization in seq2seq tasks
Enhances translation quality across datasets
Abstract
Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approaches for seq2seq settings either do token level smoothing, or smooth over sequences generated by randomly substituting tokens in the target sequence. Unlike these works, in this paper, we propose a technique that smooths over \emph{well formed} relevant sequences that not only have sufficient n-gram overlap with the target sequence, but are also \emph{semantically similar}. Our method shows a consistent and significant improvement over the state-of-the-art techniques on different datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLabel Smoothing · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence
