Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding
Rie Johnson, Tong Zhang

TL;DR
This paper introduces a semi-supervised CNN framework that learns region embeddings from unlabeled data to improve text categorization, outperforming previous methods in sentiment and topic classification.
Contribution
It proposes a novel semi-supervised approach that learns region embeddings from unlabeled data for CNN-based text categorization, differing from prior word embedding methods.
Findings
Achieved better results than previous approaches on sentiment classification.
Improved performance on topic classification tasks.
Demonstrated effectiveness of region embeddings learned from unlabeled data.
Abstract
This paper presents a new semi-supervised framework with convolutional neural networks (CNNs) for text categorization. Unlike the previous approaches that rely on word embeddings, our method learns embeddings of small text regions from unlabeled data for integration into a supervised CNN. The proposed scheme for embedding learning is based on the idea of two-view semi-supervised learning, which is intended to be useful for the task of interest even though the training is done on unlabeled data. Our models achieve better results than previous approaches on sentiment classification and topic classification tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining
