TopicsRanksDC: Distance-based Topic Ranking applied on Two-Class Data
Malik Yousef, Jamal Al Qundus, Silvio Peikert, and Adrian Paschke

TL;DR
TopicsRanksDC is a new method that ranks text topics based on the distance between class-specific clusters, effectively identifying significant topics for class separation in two-class datasets.
Contribution
It introduces a distance-based ranking approach for LDA-derived topics, enhancing the identification of relevant topics in two-class text data.
Findings
LDA topics have higher rank scores than random topics.
Distance metrics effectively distinguish significant topics.
Promising results for future search engine applications.
Abstract
In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Complex Network Analysis Techniques · Advanced Text Analysis Techniques
MethodsLinear Discriminant Analysis
