Sparse Topical Coding

Jun Zhu; Eric P. Xing

arXiv:1202.3778·cs.LG·February 20, 2012

Sparse Topical Coding

Jun Zhu, Eric P. Xing

PDF

TL;DR

Sparse Topical Coding (STC) is a non-probabilistic topic modeling approach that allows direct control of sparsity, integrates with supervised learning, and is efficiently optimized, outperforming traditional probabilistic models.

Contribution

Introduces STC, a novel non-probabilistic framework for topic modeling that relaxes normalization constraints and enables sparsity control and seamless supervised learning integration.

Findings

01

STC effectively identifies meaningful topics and improves classification accuracy.

02

STC offers computational efficiency over probabilistic models.

03

Supervised MedSTC enhances topical relevance and predictive performance.

Abstract

We present sparse topical coding (STC), a non-probabilistic formulation of topic models for discovering latent representations of large collections of data. Unlike probabilistic topic models, STC relaxes the normalization constraint of admixture proportions and the constraint of defining a normalized likelihood function. Such relaxations make STC amenable to: 1) directly control the sparsity of inferred representations by using sparsity-inducing regularizers; 2) be seamlessly integrated with a convex error function (e.g., SVM hinge loss) for supervised learning; and 3) be efficiently learned with a simply structured coordinate descent algorithm. Our results demonstrate the advantages of STC and supervised MedSTC on identifying topical meanings of words and improving classification accuracy and time efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.