A New Geometric Approach to Latent Topic Modeling and Discovery
Weicong Ding, Mohammad H. Rohban, Prakash Ishwar, Venkatesh Saligrama

TL;DR
This paper introduces a convex, geometrically-motivated algorithm for latent topic discovery that efficiently identifies unique words for each topic, outperforming existing non-convex optimization methods on various datasets.
Contribution
The paper presents a novel convex algorithm for nonnegative matrix factorization tailored to latent topic modeling, improving robustness and computational efficiency.
Findings
Algorithm is convex and polynomial-time
Performs competitively on synthetic datasets
Effective on real-world text and image data
Abstract
A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme points of empirical cross-document word-frequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
