A New Geometric Approach to Latent Topic Modeling and Discovery

Weicong Ding; Mohammad H. Rohban; Prakash Ishwar; Venkatesh Saligrama

arXiv:1301.0858·stat.ML·November 17, 2016·ICASSP

A New Geometric Approach to Latent Topic Modeling and Discovery

Weicong Ding, Mohammad H. Rohban, Prakash Ishwar, Venkatesh Saligrama

PDF

TL;DR

This paper introduces a convex, geometrically-motivated algorithm for latent topic discovery that efficiently identifies unique words for each topic, outperforming existing non-convex optimization methods on various datasets.

Contribution

The paper presents a novel convex algorithm for nonnegative matrix factorization tailored to latent topic modeling, improving robustness and computational efficiency.

Findings

01

Algorithm is convex and polynomial-time

02

Performs competitively on synthetic datasets

03

Effective on real-world text and image data

Abstract

A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme points of empirical cross-document word-frequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.