# Predicting Economic Development using Geolocated Wikipedia Articles

**Authors:** Evan Sheehan, Chenlin Meng, Matthew Tan, Burak Uzkent, Neal Jean,, David Lobell, Marshall Burke, Stefano Ermon

arXiv: 1905.01627 · 2019-05-14

## TL;DR

This paper introduces a novel NLP-based approach to estimate socioeconomic indicators in developing regions by analyzing geolocated Wikipedia articles, outperforming previous benchmarks when combined with satellite imagery.

## Contribution

The study presents a new method leveraging open-source Wikipedia data and NLP techniques to predict community socioeconomic indicators, addressing data scarcity in developing countries.

## Key findings

- Outperforms previous benchmarks in predicting asset wealth and education outcomes.
- Combining Wikipedia data with satellite imagery enhances prediction accuracy.
- Demonstrates Wikipedia's potential as a valuable data source for social science research.

## Abstract

Progress on the UN Sustainable Development Goals (SDGs) is hampered by a persistent lack of data regarding key social, environmental, and economic indicators, particularly in developing countries. For example, data on poverty --- the first of seventeen SDGs --- is both spatially sparse and infrequently collected in Sub-Saharan Africa due to the high cost of surveys. Here we propose a novel method for estimating socioeconomic indicators using open-source, geolocated textual information from Wikipedia articles. We demonstrate that modern NLP techniques can be used to predict community-level asset wealth and education outcomes using nearby geolocated Wikipedia articles. When paired with nightlights satellite imagery, our method outperforms all previously published benchmarks for this prediction task, indicating the potential of Wikipedia to inform both research in the social sciences and future policy decisions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.01627/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1905.01627/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1905.01627/full.md

---
Source: https://tomesphere.com/paper/1905.01627