Revisiting LSTM Networks for Semi-Supervised Text Classification via   Mixed Objective Function

Devendra Singh Sachan; Manzil Zaheer; Ruslan Salakhutdinov

arXiv:2009.04007·cs.CL·September 10, 2020

Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

Devendra Singh Sachan, Manzil Zaheer, Ruslan Salakhutdinov

PDF

1 Repo

TL;DR

This paper demonstrates that a simple bidirectional LSTM, trained with a mixed objective function combining supervised and semi-supervised losses, can achieve state-of-the-art results in text classification without complex pretraining.

Contribution

The authors introduce a training strategy using a mixed objective function that enables simple BiLSTM models to outperform complex models in semi-supervised text classification.

Findings

01

Achieved state-of-the-art results on ACL-IMDB and AG-News datasets.

02

Outperformed existing methods by a substantial margin.

03

Improved relation extraction performance using the proposed approach.

Abstract

In this paper, we study bidirectional LSTM network for the task of text classification using both supervised and semi-supervised approaches. Several prior works have suggested that either complex pretraining schemes using unsupervised methods such as language modeling (Dai and Le 2015; Miyato, Dai, and Goodfellow 2016) or complicated models (Johnson and Zhang 2017) are necessary to achieve a high classification accuracy. However, we develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report state-of-the-art results for text classification task on several benchmark datasets. In particular,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DevSinghSachan/ssl_text_classification
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Bidirectional LSTM · Tanh Activation · Long Short-Term Memory