Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics
Seyedeh Fatemeh Ebrahimi, Jaakko Peltonen

TL;DR
This paper introduces a constrained NMF approach for guided topic modeling that effectively identifies minority, domain-specific topics like mental health in online comments without requiring detailed pre-specification of topic divisions.
Contribution
It proposes a novel constrained NMF method incorporating seed words and prevalence constraints, enabling discovery of minority topics without detailed expert guidance.
Findings
Outperforms baselines on synthetic data in purity and mutual information
Successfully identifies mental health topics in YouTube comments
Uses KKT conditions with multiplicative updates for fitting
Abstract
Topic models often fail to capture low-prevalence, domain-critical themes, so-called minority topics, such as mental health themes in online comments. While some existing methods can incorporate domain knowledge, such as expected topical content, methods allowing guidance may require overly detailed expected topics, hindering the discovery of topic divisions and variation. We propose a topic modeling solution via a specially constrained NMF. We incorporate a seed word list characterizing minority content of interest, but we do not require experts to pre-specify their division across minority topics. Through prevalence constraints on minority topics and seed word content across topics, we learn distinct data-driven minority topics as well as majority topics. The constrained NMF is fitted via Karush-Kuhn-Tucker (KKT) conditions with multiplicative updates. We outperform several baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Complex Network Analysis Techniques
