TL;DR
This paper introduces a novel street view image classification method that leverages detected building bounding boxes and their contextual metadata, significantly improving urban land use classification accuracy over traditional CNN-based models.
Contribution
The paper proposes a new
Findings
12.65% improvement in macro-precision
12% improvement in macro-recall
Introduces a new dataset 'BEAUTY' with 19,070 images
Abstract
Street view images classification aiming at urban land use analysis is difficult because the class labels (e.g., commercial area), are concepts with higher abstract level compared to the ones of general visual tasks (e.g., persons and cars). Therefore, classification models using only visual features often fail to achieve satisfactory performance. In this paper, a novel approach based on a "Detector-Encoder-Classifier" framework is proposed. Instead of using visual features of the whole image directly as common image-level models based on convolutional neural networks (CNNs) do, the proposed framework firstly obtains the bounding boxes of buildings in street view images from a detector. Their contextual information such as the co-occurrence patterns of building classes and their layout are then encoded into metadata by the proposed algorithm "CODING" (Context encOding of Detected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
