Latent-Variable Generative Models for Data-Efficient Text Classification
Xiaoan Ding, Kevin Gimpel

TL;DR
This paper introduces discrete latent variables into generative text classifiers, enhancing data efficiency and robustness, and demonstrates improved performance over traditional models on multiple datasets, especially with limited data.
Contribution
It proposes a novel approach of integrating discrete latent variables into generative classifiers for text, optimizing via gradient methods, and shows significant performance gains in small-data scenarios.
Findings
Latent variables improve classification accuracy in small-data settings.
The model captures interpretable data properties through the latent space.
Including the latent variable as an auxiliary improves performance significantly.
Abstract
Generative classifiers offer potential advantages over their discriminative counterparts, namely in the areas of data efficiency, robustness to data shift and adversarial examples, and zero-shot learning (Ng and Jordan,2002; Yogatama et al., 2017; Lewis and Fan,2019). In this paper, we improve generative text classifiers by introducing discrete latent variables into the generative story, and explore several graphical model configurations. We parameterize the distributions using standard neural architectures used in conditional language modeling and perform learning by directly maximizing the log marginal likelihood via gradient-based optimization, which avoids the need to do expectation-maximization. We empirically characterize the performance of our models on six text classification datasets. The choice of where to include the latent variable has a significant impact on performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
