A Rare Topic Discovery Model for Short Texts Based on Co-occurrence word Network
Chengjie Ma, Junping Du, Yingxia Shao, Ang Li, Zeli Guan

TL;DR
This paper introduces CWIBTD, a novel model based on co-occurrence word networks that effectively discovers scarce and rare topics in unbalanced short-text datasets, improving sensitivity and semantic density.
Contribution
The paper presents a new co-occurrence network-based model for rare topic discovery in short texts, addressing sparsity and unbalance issues more effectively than previous methods.
Findings
CWIBTD outperforms baseline approaches in discovering rare topics.
The model enhances sensitivity to scarce topics by improving node activity calculation.
Experimental results validate the effectiveness of CWIBTD on unbalanced short-text datasets.
Abstract
We provide a simple and general solution for the discovery of scarce topics in unbalanced short-text datasets, namely, a word co-occurrence network-based model CWIBTD, which can simultaneously address the sparsity and unbalance of short-text topics and attenuate the effect of occasional pairwise occurrences of words, allowing the model to focus more on the discovery of scarce topics. Unlike previous approaches, CWIBTD uses co-occurrence word networks to model the topic distribution of each word, which improves the semantic density of the data space and ensures its sensitivity in identify-ing rare topics by improving the way node activity is calculated and normal-izing scarce topics and large topics to some extent. In addition, using the same Gibbs sampling as LDA makes CWIBTD easy to be extended to vari-ous application scenarios. Extensive experimental validation in the unbal-anced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Complex Network Analysis Techniques
MethodsLinear Discriminant Analysis
