From Pixels to Predicates Structuring urban perception with scene graphs

Yunlong Liu; Shuyang Li; Pengyuan Liu; Yu Zhang; Rudi Stouffs

arXiv:2512.19221·cs.CV·December 23, 2025

From Pixels to Predicates Structuring urban perception with scene graphs

Yunlong Liu, Shuyang Li, Pengyuan Liu, Yu Zhang, Rudi Stouffs

PDF

Open Access

TL;DR

This paper introduces a three-stage pipeline that converts street view images into structured scene graphs to improve the prediction of urban perception indicators, outperforming image-only methods and enhancing interpretability.

Contribution

It presents a novel approach combining scene graph parsing, graph autoencoding, and neural prediction to model urban perception more effectively and interpretably.

Findings

01

Improves perception prediction accuracy by 26% over baselines

02

Maintains strong cross-city generalization

03

Identifies relational patterns affecting perception scores

Abstract

Perception research is increasingly modelled using streetscapes, yet many approaches still rely on pixel features or object co-occurrence statistics, overlooking the explicit relations that shape human perception. This study proposes a three stage pipeline that transforms street view imagery (SVI) into structured representations for predicting six perceptual indicators. In the first stage, each image is parsed using an open-set Panoptic Scene Graph model (OpenPSG) to extract object predicate object triplets. In the second stage, compact scene-level embeddings are learned through a heterogeneous graph autoencoder (GraphMAE). In the third stage, a neural network predicts perception scores from these embeddings. We evaluate the proposed approach against image-only baselines in terms of accuracy, precision, and cross-city generalization. Results indicate that (i) our approach improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutomated Road and Building Extraction · Human Mobility and Location-Based Analysis · Advanced Neural Network Applications