A comprehensive study on Frequent Pattern Mining and Clustering categories for topic detection in Persian text stream
Elnaz Zafarani-Moattar, Mohammad Reza Kangavari, Amir Masoud Rahmani

TL;DR
This study evaluates and adapts various topic detection algorithms, including a novel hybrid approach, for Persian social media texts, and introduces a new evaluation criterion to improve understanding and clustering of topics.
Contribution
It adapts and compares ten topic detection methods for Persian, proposes a hybrid category, and introduces a new evaluation metric, FS, for better performance assessment.
Findings
Hybrid methods excel in keyword-topic detection.
Frequent pattern methods are better for clustering.
New evaluation criterion FS enhances assessment accuracy.
Abstract
Topic detection is a complex process and depends on language because it somehow needs to analyze text. There have been few studies on topic detection in Persian, and the existing algorithms are not remarkable. Therefore, we aimed to study topic detection in Persian. The objectives of this study are: 1) to conduct an extensive study on the best algorithms for topic detection, 2) to identify necessary adaptations to make these algorithms suitable for the Persian language, and 3) to evaluate their performance on Persian social network texts. To achieve these objectives, we have formulated two research questions: First, considering the lack of research in Persian, what modifications should be made to existing frameworks, especially those developed in English, to make them compatible with Persian? Second, how do these algorithms perform, and which one is superior? There are various topic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Web Data Mining and Analysis · Text and Document Classification Technologies
