Inferring the Origin Locations of Tweets with Quantitative Confidence
Reid Priedhorsky (1), Aron Culotta (2), Sara Y. Del Valle (1) ((1) Los, Alamos National Laboratory, (2) Illinois Institute of Technology)

TL;DR
This paper presents a scalable, content-based method using Gaussian mixture models to accurately infer the geographic origin of tweets, providing quantified uncertainty and demonstrating effectiveness on a large, multilingual dataset.
Contribution
It introduces a simple yet effective Gaussian mixture model variant for tweet geolocation with novel metrics for uncertainty and calibration, requiring minimal training data.
Findings
Reliable, well-calibrated location estimates on 13 million tweets
Effective with as few as 30,000 training tweets
Models remain effective over several weeks and leverage small-footprint toponyms and languages
Abstract
Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
