Geotagging One Hundred Million Twitter Accounts with Total Variation   Minimization

Ryan Compton; David Jurgens; David Allen

arXiv:1404.7152·cs.SI·March 5, 2015

Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization

Ryan Compton, David Jurgens, David Allen

PDF

TL;DR

This paper introduces a scalable, total variation minimization method to geolocate over 100 million Twitter users using only publicly available data, achieving median errors of 6.38 km.

Contribution

It presents a novel optimization-based approach for large-scale social media geotagging that does not rely on user-shared location data.

Findings

01

Geolocated over 80% of public tweets.

02

Median geolocation error of 6.38 km.

03

Scalable distributed algorithm for social network data.

Abstract

Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data. Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.