Contrastive estimation reveals topic posterior information to linear   models

Christopher Tosh; Akshay Krishnamurthy; Daniel Hsu

arXiv:2003.02234·cs.LG·March 5, 2020·19 cites

Contrastive estimation reveals topic posterior information to linear models

Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu

PDF

Open Access

TL;DR

This paper proves that contrastive learning can uncover topic posterior information in documents, enabling linear models to perform well in classification tasks with limited labeled data.

Contribution

It provides a theoretical proof that contrastive learning reveals topic posterior information to linear models in document classification.

Findings

01

Linear classifiers perform well with contrastive representations in low-data regimes.

02

Contrastive learning can recover underlying topic structures in documents.

03

Empirical results show effectiveness in semi-supervised classification.

Abstract

Contrastive learning is an approach to representation learning that utilizes naturally occurring similar and dissimilar pairs of data points to find useful embeddings of data. In the context of document classification under topic modeling assumptions, we prove that contrastive learning is capable of recovering a representation of documents that reveals their underlying topic posterior information to linear models. We apply this procedure in a semi-supervised setup and demonstrate empirically that linear classifiers with these representations perform well in document classification tasks with very few training examples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Domain Adaptation and Few-Shot Learning