From Pixels to Predicates Structuring urban perception with scene graphs
Yunlong Liu, Shuyang Li, Pengyuan Liu, Yu Zhang, Rudi Stouffs

TL;DR
This paper introduces a three-stage pipeline that converts street view images into structured scene graphs to improve the prediction of urban perception indicators, outperforming image-only methods and enhancing interpretability.
Contribution
It presents a novel approach combining scene graph parsing, graph autoencoding, and neural prediction to model urban perception more effectively and interpretably.
Findings
Improves perception prediction accuracy by 26% over baselines
Maintains strong cross-city generalization
Identifies relational patterns affecting perception scores
Abstract
Perception research is increasingly modelled using streetscapes, yet many approaches still rely on pixel features or object co-occurrence statistics, overlooking the explicit relations that shape human perception. This study proposes a three stage pipeline that transforms street view imagery (SVI) into structured representations for predicting six perceptual indicators. In the first stage, each image is parsed using an open-set Panoptic Scene Graph model (OpenPSG) to extract object predicate object triplets. In the second stage, compact scene-level embeddings are learned through a heterogeneous graph autoencoder (GraphMAE). In the third stage, a neural network predicts perception scores from these embeddings. We evaluate the proposed approach against image-only baselines in terms of accuracy, precision, and cross-city generalization. Results indicate that (i) our approach improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutomated Road and Building Extraction · Human Mobility and Location-Based Analysis · Advanced Neural Network Applications
