Unveiling Location-Specific Price Drivers: A Two-Stage Cluster Analysis for Interpretable House Price Predictions
Paul G\"ummer, Julian Rosenberger, Mathias Kraus, Patrick Zschech, Nico Hambauer

TL;DR
This paper introduces a two-stage clustering approach for house price prediction that improves interpretability and accuracy by modeling localized market variations in German real estate data.
Contribution
It presents a novel two-stage clustering method combined with interpretable models like LR and GAM for better localized house price valuation.
Findings
36% improvement in GAM mean absolute error
58% improvement in LR mean absolute error
Graphical analysis reveals pattern shifts between clusters
Abstract
House price valuation remains challenging due to localized market variations. Existing approaches often rely on black-box machine learning models, which lack interpretability, or simplistic methods like linear regression (LR), which fail to capture market heterogeneity. To address this, we propose a machine learning approach that applies two-stage clustering, first grouping properties based on minimal location-based features before incorporating additional features. Each cluster is then modeled using either LR or a generalized additive model (GAM), balancing predictive performance with interpretability. Constructing and evaluating our models on 43,309 German house property listings from 2023, we achieve a 36% improvement for the GAM and 58% for LR in mean absolute error compared to models without clustering. Additionally, graphical analyses unveil pattern shifts between clusters. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHousing Market and Economics · Stock Market Forecasting Methods · Financial Distress and Bankruptcy Prediction
