VSEC-LDA: Boosting Topic Modeling with Embedded Vocabulary Selection
Yuzhen Ding, Baoxin Li

TL;DR
VSEC-LDA introduces a dynamic, entropy-driven vocabulary selection process embedded within topic modeling to enhance the relevance and quality of discovered topics across various applications.
Contribution
The paper presents VSEC-LDA, a novel method that integrates vocabulary selection into topic modeling, improving relevance and performance over traditional pre-processing approaches.
Findings
Built-in vocabulary selection improves topic coherence.
Dynamic selection adapts to different datasets effectively.
Experimental results show enhanced model performance.
Abstract
Topic modeling has found wide application in many problems where latent structures of the data are crucial for typical inference tasks. When applying a topic model, a relatively standard pre-processing step is to first build a vocabulary of frequent words. Such a general pre-processing step is often independent of the topic modeling stage, and thus there is no guarantee that the pre-generated vocabulary can support the inference of some optimal (or even meaningful) topic models appropriate for a given task, especially for computer vision applications involving "visual words". In this paper, we propose a new approach to topic modeling, termed Vocabulary-Selection-Embedded Correspondence-LDA (VSEC-LDA), which learns the latent model while simultaneously selecting most relevant words. The selection of words is driven by an entropy-based metric that measures the relative contribution of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Topic Modeling
