A novel sentence embedding based topic detection method for micro-blog
Cong Wan, Shan Jiang, Cuirong Wang, Cong Wang, Changming Xu, Xianxia, Chen, Ying Yuan

TL;DR
This paper introduces a neural network-based unsupervised sentence embedding method combined with a relationship-aware clustering algorithm to effectively detect topics in micro-blog datasets without prior knowledge of topic count.
Contribution
It presents a novel weighted power mean sentence embedding model with attention, and an improved clustering algorithm RADBSCAN for micro-blog topic detection.
Findings
Embedding method outperforms baseline in sentence clustering
RADBSCAN successfully discovers dataset-specific topics
The approach extracts relevant keywords for each detected topic
Abstract
Topic detection is a challenging task, especially without knowing the exact number of topics. In this paper, we present a novel approach based on neural network to detect topics in the micro-blogging dataset. We use an unsupervised neural sentence embedding model to map the blogs to an embedding space. Our model is a weighted power mean word embedding model, and the weights are calculated by attention mechanism. Experimental result shows our embedding method performs better than baselines in sentence clustering. In addition, we propose an improved clustering algorithm referred as relationship-aware DBSCAN (RADBSCAN). It can discover topics from a micro-blogging dataset, and the topic number depends on dataset character itself. Moreover, in order to solve the problem of parameters sensitive, we take blog forwarding relationship as a bridge of two independent clusters. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Web Data Mining and Analysis
