# Delta-training: Simple Semi-Supervised Text Classification using   Pretrained Word Embeddings

**Authors:** Hwiyeol Jo, Ceyda Cinarel

arXiv: 1901.07651 · 2019-10-01

## TL;DR

Delta-training is a simple semi-supervised text classification method that leverages pretrained embeddings and model ensembles to outperform traditional self-training approaches across multiple datasets.

## Contribution

It introduces a novel ensemble-based semi-supervised method that compares classifiers with different embedding initializations to improve text classification.

## Key findings

- Outperforms self-training and co-training in 4 datasets
- Robust against error accumulation
- Utilizes pretrained embeddings for improved performance

## Abstract

We propose a novel and simple method for semi-supervised text classification. The method stems from the hypothesis that a classifier with pretrained word embeddings always outperforms the same classifier with randomly initialized word embeddings, as empirically observed in NLP tasks. Our method first builds two sets of classifiers as a form of model ensemble, and then initializes their word embeddings differently: one using random, the other using pretrained word embeddings. We focus on different predictions between the two classifiers on unlabeled data while following the self-training framework. We also use early-stopping in meta-epoch to improve the performance of our method. Our method, Delta-training, outperforms the self-training and the co-training framework in 4 different text classification datasets, showing robustness against error accumulation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.07651/full.md

## Figures

21 figures with captions in the complete paper: https://tomesphere.com/paper/1901.07651/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1901.07651/full.md

---
Source: https://tomesphere.com/paper/1901.07651