Semi-Supervised Sequence Modeling with Cross-View Training

Kevin Clark; Minh-Thang Luong; Christopher D. Manning; Quoc V. Le

arXiv:1809.08370·cs.CL·September 25, 2018·20 cites

Semi-Supervised Sequence Modeling with Cross-View Training

Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc V. Le

PDF

Open Access 2 Repos

TL;DR

This paper introduces Cross-View Training (CVT), a semi-supervised learning method that enhances sequence model representations by leveraging unlabeled data through auxiliary tasks with restricted input views.

Contribution

The paper presents CVT, a novel semi-supervised training algorithm that improves Bi-LSTM encoders by using auxiliary prediction modules on unlabeled data, especially effective with multi-task learning.

Findings

01

Achieved state-of-the-art results on five sequence tagging tasks.

02

Improved model representations by leveraging unlabeled data.

03

Enhanced performance in machine translation and dependency parsing.

Abstract

Unsupervised representation learning algorithms such as word2vec and ELMo improve the accuracy of many supervised NLP models, mainly because they can take advantage of large amounts of unlabeled text. However, the supervised models only learn from task-specific labeled data during the main training phase. We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data. On labeled examples, standard supervised learning is used. On unlabeled examples, CVT teaches auxiliary prediction modules that see restricted views of the input (e.g., only part of a sentence) to match the predictions of the full model seeing the whole input. Since the auxiliary modules and the full model share intermediate representations, this in turn improves the full model. Moreover, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM · Convolution · CNN Bidirectional LSTM · [LivE@PeRson]How do I talk to a real person at Expedia? · Softmax · Dropout · Cross-View Training