Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models
Rodrigo Gallardo, Oz Fishman, Alexander Htet Kyaw

TL;DR
This paper presents a human-in-the-loop AI framework that analyzes urban scenes using co-occurrence embeddings and vision-language models to suggest contextually relevant design interventions, promoting participatory urban planning.
Contribution
It introduces a novel AI system combining object detection, co-occurrence analysis, and scene reasoning to support micro-scale urban design with continuous human input.
Findings
Successfully detects urban objects using Grounding DINO
Builds co-occurrence embeddings revealing common spatial patterns
Generates contextually relevant object suggestions for urban design
Abstract
This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local participation. Using Grounding DINO and a curated subset of the ADE20K dataset as a proxy for the urban built environment, the system detects urban objects and builds co-occurrence embeddings that reveal common spatial configurations. From this analysis, the user receives five statistically likely complements to a chosen anchor object. A vision language model then reasons over the scene image and the selected pair to suggest a third object that completes a more complex urban tactic. The workflow keeps people in control of selection and refinement and aims to move beyond top-down master planning by grounding choices in everyday patterns and lived experience.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Urban Design and Spatial Analysis · Smart Cities and Technologies
