Discrete Latent Variable Representations for Low-Resource Text   Classification

Shuning Jin; Sam Wiseman; Karl Stratos; Karen Livescu

arXiv:2006.06226·cs.CL·June 12, 2020·1 cites

Discrete Latent Variable Representations for Low-Resource Text Classification

Shuning Jin, Sam Wiseman, Karl Stratos, Karen Livescu

PDF

Open Access 1 Repo

TL;DR

This paper explores discrete latent variable models for text classification, demonstrating their interpretability and efficiency, and shows they outperform continuous models in low-resource scenarios, especially with amortized Hard EM.

Contribution

It introduces and compares methods for learning discrete latent variables in text, highlighting their advantages over continuous models in low-resource classification tasks.

Findings

01

Discrete models outperform continuous ones in low-resource settings.

02

Amortized Hard EM performs exceptionally well with limited data.

03

Discrete representations are more compressed and interpretable.

Abstract

While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient. We consider several approaches to learning discrete latent variable models for text in the case where exact marginalization over these variables is intractable. We compare the performance of the learned representations as features for low-resource document and sentence classification. Our best models outperform the previous best reported results with continuous representations in these low-resource settings, while learning significantly more compressed representations. Interestingly, we find that an amortized variant of Hard EM performs particularly well in the lowest-resource regimes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuningjin/discrete-text-rep
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques