Describing and Understanding Neighborhood Characteristics through Online Social Media
Mohamed Kafsi, Henriette Cramer, Bart Thomee, David A. Shamma

TL;DR
This paper introduces the geographical hierarchy model (GHM), a probabilistic approach that leverages geotagged social media data to identify and compare region-specific content, improving classification accuracy over traditional methods.
Contribution
The paper presents the GHM, a novel probabilistic model that distinguishes local from general content in geotagged data, enhancing regional characterization and comparison capabilities.
Findings
GHM improves classification accuracy by 47% over Naive Bayes.
GHM outperforms hierarchical TF-IDF by 27%.
Model effectively identifies region-specific content and compares regions.
Abstract
Geotagged data can be used to describe regions in the world and discover local themes. However, not all data produced within a region is necessarily specifically descriptive of that area. To surface the content that is characteristic for a region, we present the geographical hierarchy model (GHM), a probabilistic model based on the assumption that data observed in a region is a random mixture of content that pertains to different levels of a hierarchy. We apply the GHM to a dataset of 8 million Flickr photos in order to discriminate between content (i.e., tags) that specifically characterizes a region (e.g., neighborhood) and content that characterizes surrounding areas or more general themes. Knowledge of the discriminative and non-discriminative terms used throughout the hierarchy enables us to quantify the uniqueness of a given region and to compare similar but distant regions. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
