TL;DR
This paper demonstrates that a simple bidirectional LSTM, trained with a mixed objective function combining supervised and semi-supervised losses, can achieve state-of-the-art results in text classification without complex pretraining.
Contribution
The authors introduce a training strategy using a mixed objective function that enables simple BiLSTM models to outperform complex models in semi-supervised text classification.
Findings
Achieved state-of-the-art results on ACL-IMDB and AG-News datasets.
Outperformed existing methods by a substantial margin.
Improved relation extraction performance using the proposed approach.
Abstract
In this paper, we study bidirectional LSTM network for the task of text classification using both supervised and semi-supervised approaches. Several prior works have suggested that either complex pretraining schemes using unsupervised methods such as language modeling (Dai and Le 2015; Miyato, Dai, and Goodfellow 2016) or complicated models (Johnson and Zhang 2017) are necessary to achieve a high classification accuracy. However, we develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report state-of-the-art results for text classification task on several benchmark datasets. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Bidirectional LSTM · Tanh Activation · Long Short-Term Memory
