# Variational Pretraining for Semi-supervised Text Classification

**Authors:** Suchin Gururangan, Tam Dang, Dallas Card, and Noah A. Smith

arXiv: 1906.02242 · 2019-06-07

## TL;DR

VAMPIRE is a lightweight variational autoencoder framework for semi-supervised text classification that performs well with limited data and resources, outperforming more complex models in low-resource settings.

## Contribution

The paper introduces VAMPIRE, a novel pretraining method using a variational autoencoder for effective semi-supervised text classification with limited resources.

## Key findings

- VAMPIRE outperforms expensive contextual embeddings in low-resource scenarios.
- Fine-tuning in-domain data improves performance of contextual embeddings with limited supervision.
- VAMPIRE is computationally efficient and effective for semi-supervised text classification.

## Abstract

We introduce VAMPIRE, a lightweight pretraining framework for effective text classification when data and computing resources are limited. We pretrain a unigram document model as a variational autoencoder on in-domain, unlabeled data and use its internal states as features in a downstream classifier. Empirically, we show the relative strength of VAMPIRE against computationally expensive contextual embeddings and other popular semi-supervised baselines under low resource settings. We also find that fine-tuning to in-domain data is crucial to achieving decent performance from contextual embeddings when working with limited supervision. We accompany this paper with code to pretrain and use VAMPIRE embeddings in downstream tasks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.02242/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1906.02242/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1906.02242/full.md

---
Source: https://tomesphere.com/paper/1906.02242