Characterizing the Heterogeneity of the OpenStreetMap Data and Community
Ding Ma, Mats Sandberg, Bin Jiang

TL;DR
This paper analyzes the heterogeneity of OpenStreetMap data and community over eight years, revealing heavy-tailed distributions and core user networks through nonlinear statistical methods.
Contribution
It provides a comprehensive characterization of OSM's data and user heterogeneity using power-law and head/tail breaks analysis on large-scale, multi-year data.
Findings
Data exhibits heavy-tailed distributions across users, elements, and contributions.
Most elements are small and lightly edited, while few are large and heavily edited.
Approximately 500 users form a highly networked core group.
Abstract
OpenStreetMap (OSM) constitutes an unprecedented, free, geographic information source contributed by millions of individuals, resulting in a database of great volume and heterogeneity. In this study, we characterize the heterogeneity of the entire OSM database and historical archive in the context of big data. We consider all users, geographic elements, and user contributions from an eight-year data archive, at a size of 692 GB. We rely on some nonlinear methods such as power-law statistics and head/tail breaks to uncover and illustrate the underlying scaling properties. All three aspects (users, elements, and contributions) demonstrate striking power laws or heavy-tailed distributions. The heavy-tailed distributions imply that there are far more small elements than large ones, far more inactive users than active ones, and far more lightly edited elements than heavily edited ones.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
