Fast Clustering of Short Text Streams Using Efficient Cluster Indexing and Dynamic Similarity Thresholds
Md Rashadul Hasan Rakib, Muhammad Asaduzzaman

TL;DR
This paper introduces FastStream, a novel short text stream clustering method that uses efficient indexing and dynamic similarity thresholds to significantly improve speed and adaptability over existing techniques.
Contribution
FastStream is the first clustering method to combine inverted index-based cluster indexing with dynamic similarity thresholds for rapid and adaptive short text stream clustering.
Findings
FastStream outperforms existing methods in clustering accuracy.
FastStream reduces running time by several orders of magnitude.
FastStream effectively handles concept drift in short text streams.
Abstract
Short text stream clustering is an important but challenging task since massive amount of text is generated from different sources such as micro-blogging, question-answering, and social news aggregation websites. One of the major challenges of clustering such massive amount of text is to cluster them within a reasonable amount of time. The existing state-of-the-art short text stream clustering methods can not cluster such massive amount of text within a reasonable amount of time as they compute similarities between a text and all the existing clusters to assign that text to a cluster. To overcome this challenge, we propose a fast short text stream clustering method (called FastStream) that efficiently index the clusters using inverted index and compute similarity between a text and a selected number of clusters while assigning a text to a cluster. In this way, we not only reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Bayesian Methods and Mixture Models · Advanced Clustering Algorithms Research
