On Spatial Lag Models estimated using crowdsourcing, web-scraping or other unconventionally collected data
Giuseppe Arbia, Vincenzo Nardelli

TL;DR
This paper addresses the challenge of estimating spatial lag models using unconventional Big Data sources like crowdsourcing and web scraping, proposing a post-sampling bias correction method validated through simulations and an empirical case study.
Contribution
It generalizes a post-sampling approach to spatial lag models, demonstrating bias reduction in non-probabilistic data collection contexts.
Findings
Bias in parameter estimates from convenience samples can be mitigated with post-sampling.
Post-sampling reduces bias but increases estimator variance.
An MSE-correction strategy balances bias reduction and variance increase.
Abstract
The Big Data revolution is challenging the state-of-the-art statistical and econometric techniques not only for the computational burden connected with the high volume and speed which data are generated, but even more for the variety of sources through which data are collected (Arbia, 2021). This paper concentrates specifically on this last aspect. Common examples of non traditional Big Data sources are represented by crowdsourcing (data voluntarily collected by individuals) and web scraping (data extracted from websites and reshaped in a structured dataset). A common characteristic to these unconventional data collections is the lack of any precise statistical sample design, a situation described in statistics as 'convenience sampling'. As it is well known, in these conditions no probabilistic inference is possible. To overcome this problem, Arbia et al. (2018) proposed the use of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial and Panel Data Analysis · Regional Economics and Spatial Analysis · Land Use and Ecosystem Services
