Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM

Lars Hillebrand; David Biesner; Christian Bauckhage; Rafet Sifa

arXiv:2507.16695·cs.CL·July 23, 2025

Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM

Lars Hillebrand, David Biesner, Christian Bauckhage, Rafet Sifa

PDF

TL;DR

This paper introduces a novel row-stochastic DEDICOM method for extracting interpretable topics and learning word embeddings from text data, offering a new approach to matrix factorization in NLP.

Contribution

It presents a new row-stochastic DEDICOM variation applied to PMI matrices for simultaneous topic extraction and word embedding learning, with an efficient training method.

Findings

01

Effective identification of latent topic clusters

02

Generation of interpretable word embeddings

03

Qualitative evaluation shows promising results

Abstract

The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.