Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM
Lars Hillebrand, David Biesner, Christian Bauckhage, Rafet Sifa

TL;DR
This paper introduces a novel row-stochastic DEDICOM method for extracting interpretable topics and learning word embeddings from text data, offering a new approach to matrix factorization in NLP.
Contribution
It presents a new row-stochastic DEDICOM variation applied to PMI matrices for simultaneous topic extraction and word embedding learning, with an efficient training method.
Findings
Effective identification of latent topic clusters
Generation of interpretable word embeddings
Qualitative evaluation shows promising results
Abstract
The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
