The Efficiency of Pre-training with Objective Masking in Pseudo Labeling for Semi-Supervised Text Classification
Arezoo Hatefi, Xuan-Son Vu, Monowar Bhuyan, Frank Drewes

TL;DR
This paper investigates the impact of objective masking during pre-training in a semi-supervised text classification model, enhancing the teacher-student pseudo-labeling approach with an unsupervised pre-training phase.
Contribution
It introduces an unsupervised pre-training phase with objective masking into the semi-supervised teacher-student model for text classification, improving performance.
Findings
Pre-training with objective masking improves classification accuracy.
The extended model outperforms baseline methods across datasets.
Results are consistent in English and Swedish datasets.
Abstract
We extend and study a semi-supervised model for text classification proposed earlier by Hatefi et al. for classification tasks in which document classes are described by a small number of gold-labeled examples, while the majority of training examples is unlabeled. The model leverages the teacher-student architecture of Meta Pseudo Labels in which a ''teacher'' generates labels for originally unlabeled training data to train the ''student'' and updates its own model iteratively based on the performance of the student on the gold-labeled portion of the data. We extend the original model of Hatefi et al. by an unsupervised pre-training phase based on objective masking, and conduct in-depth performance evaluations of the original model, our extension, and various independent baselines. Experiments are performed using three different datasets in two different languages (English and Swedish).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Topic Modeling
MethodsMeta Pseudo Labels
