A Case Study in Text Mining: Interpreting Twitter Data From World Cup Tweets
Daniel Godfrey, Caley Johns, Carl Meyer, Shaina Race, Carol Sadek

TL;DR
This paper demonstrates how cluster analysis can be applied to Twitter data related to the World Cup, using novel algorithms to filter noise and compare clustering methods, with visualization for interpretation.
Contribution
Introduces an algorithm combining DBSCAN and consensus matrices to filter irrelevant tweets and compares k-means and NMF for topic extraction from social media data.
Findings
NMF is faster and more interpretable than k-means.
The combined filtering algorithm effectively isolates relevant tweets.
Similar clustering results were obtained with both methods.
Abstract
Cluster analysis is a field of data analysis that extracts underlying patterns in data. One application of cluster analysis is in text-mining, the analysis of large collections of text to find similarities between documents. We used a collection of about 30,000 tweets extracted from Twitter just before the World Cup started. A common problem with real world text data is the presence of linguistic noise. In our case it would be extraneous tweets that are unrelated to dominant themes. To combat this problem, we created an algorithm that combined the DBSCAN algorithm and a consensus matrix. This way we are left with the tweets that are related to those dominant themes. We then used cluster analysis to find those topics that the tweets describe. We clustered the tweets using k-means, a commonly used clustering algorithm, and Non-Negative Matrix Factorization (NMF) and compared the results.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Web Data Mining and Analysis · Sentiment Analysis and Opinion Mining
