Determine the User Country of a Tweet
Han van der Veen, Djoerd Hiemstra, Tijs van den Broek, Michel, Ehrenhard, Ariana Need

TL;DR
This paper presents a Naive Bayes classifier that accurately predicts the country of a tweet without GPS data, achieving 82% accuracy by leveraging user profile features like timezone and location.
Contribution
It introduces a method for inferring tweet locations using user profile features, with an analysis of feature importance and error sources.
Findings
Achieved 82% accuracy in country prediction.
Timezone and parsed user location are the most informative features.
Errors often stem from limited information and shared country properties.
Abstract
In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users' timezone, the user's language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Geographic Information Systems Studies · Complex Network Analysis Techniques
