Sparsemax and Relaxed Wasserstein for Topic Sparsity

Tianyi Lin; Zhiyue Hu; Xin Guo

arXiv:1810.09079·cs.LG·November 29, 2018·1 cites

Sparsemax and Relaxed Wasserstein for Topic Sparsity

Tianyi Lin, Zhiyue Hu, Xin Guo

PDF

Open Access

TL;DR

This paper introduces two neural models utilizing Gaussian sparsemax and relaxed Wasserstein divergence to effectively capture topic sparsity in short and social media texts, improving analysis accuracy and stability.

Contribution

The paper proposes novel neural models with sparse posterior distributions for topic modeling, using Gaussian sparsemax and relaxed Wasserstein divergence, enhancing stability and performance over existing methods.

Findings

01

Models outperform probabilistic and neural baselines

02

Effective in capturing topic sparsity in short texts

03

Demonstrated on large diverse text corpora

Abstract

Topic sparsity refers to the observation that individual documents usually focus on several salient topics instead of covering a wide variety of topics, and a real topic adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this topic sparsity is especially important for analyzing user-generated web content and social media, which are featured in the form of extremely short posts and discussions. As topic sparsity of individual documents in online social media increases, so does the difficulty of analyzing the online text sources using traditional methods. In this paper, we propose two novel neural models by providing sparse posterior distributions over topics based on the Gaussian sparsemax construction, enabling efficient training by stochastic backpropagation. We construct an inference network conditioned on the input data and infer the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques

MethodsSoftmax