Autodetection and Classification of Hidden Cultural City Districts from Yelp Reviews
Harini Suresh, Nicholas Locascio

TL;DR
This paper employs topic modeling and clustering techniques on Yelp reviews to identify and classify both known and hidden cultural districts within cities, enhancing understanding of urban cultural landscapes.
Contribution
It introduces a combined approach using LDA and clustering methods to detect and visualize hidden cultural districts from review data.
Findings
Successfully identified known cultural districts like Chinatown.
Discovered hidden or less obvious districts based on review patterns.
Provided a visual map-based representation of districts and their similarities.
Abstract
Topic models are a way to discover underlying themes in an otherwise unstructured collection of documents. In this study, we specifically used the Latent Dirichlet Allocation (LDA) topic model on a dataset of Yelp reviews to classify restaurants based off of their reviews. Furthermore, we hypothesize that within a city, restaurants can be grouped into similar "clusters" based on both location and similarity. We used several different clustering methods, including K-means Clustering and a Probabilistic Mixture Model, in order to uncover and classify districts, both well-known and hidden (i.e. cultural areas like Chinatown or hearsay like "the best street for Italian restaurants") within a city. We use these models to display and label different clusters on a map. We also introduce a topic similarity heatmap that displays the similarity distribution in a city to a new restaurant.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Text and Document Classification Technologies · Video Analysis and Summarization
MethodsHeatmap · k-Means Clustering
