Revisiting Self-Training for Neural Sequence Generation
Junxian He, Jiatao Gu, Jiajun Shen, Marc'Aurelio Ranzato

TL;DR
This paper revisits self-training for neural sequence generation, demonstrating its effectiveness and introducing a noisy self-training method that leverages unlabeled data to significantly improve performance.
Contribution
It empirically shows self-training benefits in sequence tasks and proposes input noise injection to enhance unlabeled data utilization.
Findings
Self-training improves neural sequence generation performance.
Dropout acts as a regularizer, aiding self-training.
Noisy self-training significantly boosts results on benchmarks.
Abstract
Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. the pseudo-parallel data). While self-training has been extensively studied on classification problems, in complex sequence generation tasks (e.g. machine translation) it is still unclear how self-training works due to the compositionality of the target space. In this work, we first empirically show that self-training is able to decently improve the supervised baseline on neural sequence generation tasks. Through careful examination of the performance gains, we find that the perturbation on the hidden states (i.e. dropout) is critical for self-training to benefit from the pseudo-parallel data, which acts as a regularizer and forces the model to yield close predictions for similar unlabeled inputs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
