Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm
Vibhuti Gupta, Rattikorn Hewett

TL;DR
This paper presents a real-time distributed framework using Apache Storm for tweet topic classification based on hashtags, achieving high accuracy and increased throughput in processing large-scale Twitter data.
Contribution
It introduces a novel distributed online approach for tweet classification that incrementally updates Naive Bayes predictors on a real-time streaming platform.
Findings
Achieved up to 97% classification accuracy.
Real-time processing with 37% throughput improvement.
Effective handling of short, context-dependent tweets.
Abstract
Twitter is a popular social network platform where users can interact and post texts of up to 280 characters called tweets. Hashtags, hyperlinked words in tweets, have increasingly become crucial for tweet retrieval and search. Using hashtags for tweet topic classification is a challenging problem because of context dependent among words, slangs, abbreviation and emoticons in a short tweet along with evolving use of hashtags. Since Twitter generates millions of tweets daily, tweet analytics is a fundamental problem of Big data stream that often requires a real-time Distributed processing. This paper proposes a distributed online approach to tweet topic classification with hashtags. Being implemented on Apache Storm, a distributed real time framework, our approach incrementally identifies and updates a set of strong predictors in the Na\"ive Bayes model for classifying each incoming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
