Real Estate Attribute Prediction from Multiple Visual Modalities with Missing Data
Eric Stumpe, Miroslav Despotovic, Zedong Zhang, Matthias Zeppelzauer

TL;DR
This paper explores predicting real estate attributes using indoor and outdoor images, addressing missing data challenges, and demonstrates that multimodal fusion improves prediction accuracy by up to 5%.
Contribution
It introduces multimodal fusion models for real estate attribute prediction that handle missing visual data and shows performance improvements with data enrichment.
Findings
Fusion of indoor and outdoor images boosts accuracy by up to 5%.
Enriching training data with incomplete samples improves predictions.
Different fusion strategies are evaluated for effectiveness.
Abstract
The assessment and valuation of real estate requires large datasets with real estate information. Unfortunately, real estate databases are usually sparse in practice, i.e., not for each property every important attribute is available. In this paper, we study the potential of predicting high-level real estate attributes from visual data, specifically from two visual modalities, namely indoor (interior) and outdoor (facade) photos. We design three models using different multimodal fusion strategies and evaluate them for three different use cases. Thereby, a particular challenge is to handle missing modalities. We evaluate different fusion strategies, present baselines for the different prediction tasks, and find that enriching the training data with additional incomplete samples can lead to an improvement in prediction accuracy. Furthermore, the fusion of information from indoor and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
