Constructing Gazetteers from Volunteered Big Geo-Data Based on Hadoop
Song Gao, Linna Li, Wenwen Li, Krzysztof Janowicz, Yue Zhang

TL;DR
This paper presents a scalable Hadoop-based platform for constructing geospatial gazetteers from volunteered geographic information, significantly reducing processing time and enhancing data quality assurance.
Contribution
It introduces a novel distributed geoprocessing workflow on Hadoop for efficient gazetteer construction from Big Geo-Data, including a provenance-based trust model.
Findings
MapReduce workflow reduces processing time by an order of magnitude
Hadoop cluster effectively handles large geospatial datasets
Provenance-based trust model improves data quality assurance
Abstract
Traditional gazetteers are built and maintained by authoritative mapping agencies. In the age of Big Data, it is possible to construct gazetteers in a data-driven approach by mining rich volunteered geographic information (VGI) from the Web. In this research, we build a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sourced gazetteer entries. Using experiments based on geotagged datasets in Flickr, we find that the MapReduce-based workflow running on the spatially enabled Hadoop cluster can reduce the processing time compared with traditional desktop-based operations by an order of magnitude. We demonstrate how to use such a novel spatial-computing infrastructure to facilitate gazetteer research. In addition, we introduce a provenance-based trust model for quality assurance. This work offers new insights on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Geographic Information Systems Studies · Peer-to-Peer Network Technologies
