Instrument variable detection with graph learning : an application to high dimensional GIS-census data for house pricing
Ning Xu, Timothy C.G. Fisher, Jian Hong

TL;DR
This paper introduces a novel graph learning approach combining machine learning and biostatistics tools to identify valid instruments and estimate causal effects in high-dimensional GIS-census data for house pricing, improving robustness and interpretability.
Contribution
It presents a data-driven method for instrument variable detection using graph learning, addressing high dimensionality and multicollinearity in causal inference for real estate data.
Findings
Efficient and robust instrument validation in high-dimensional data
Recovery of a sparse, interpretable causal structure
Alignment with classical econometric results and theories
Abstract
Endogeneity bias and instrument variable validation have always been important topics in statistics and econometrics. In the era of big data, such issues typically combine with dimensionality issues and, hence, require even more attention. In this paper, we merge two well-known tools from machine learning and biostatistics---variable selection algorithms and probablistic graphs---to estimate house prices and the corresponding causal structure using 2010 data on Sydney. The estimation uses a 200-gigabyte ultrahigh dimensional database consisting of local school data, GIS information, census data, house characteristics and other socio-economic records. Using "big data", we show that it is possible to perform a data-driven instrument selection efficiently and purge out the invalid instruments. Our approach improves the sparsity of variable selection, stability and robustness in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHousing Market and Economics · Statistical Methods and Inference · Water resources management and optimization
MethodsPruning
