From Street View to Visual Network: Mapping the Visibility of Urban Landmarks with Vision-Language Models
Zicheng Fan, Kunihiko Fujiwara, Pengyuan Liu, Fan Zhang, Filip Biljecki

TL;DR
This paper introduces a novel image-based approach using vision-language models to assess urban landmark visibility from street view imagery, enabling detailed visual connectivity mapping in cities.
Contribution
It reformulates landmark visibility assessment as an urban visual search problem and constructs a visibility graph to analyze visual connectivity among landmarks and urban spaces.
Findings
Achieved 87% detection accuracy across six landmarks in global cities.
Constructed a visibility graph revealing multi-landmark connections and key mediating locations.
Demonstrated the method's effectiveness as a practical alternative to traditional line-of-sight analysis.
Abstract
Visibility analysis in urban planning has traditionally relied on line-of-sight (LoS) simulations, which capture geometric occlusion. However, these approaches depend on accurate 3D data that is often unavailable and may not adequately represent how visually distinctive urban landmarks are encountered in real streetscapes. We reformulate landmark visibility assessment as an urban visual search problem in image space by leveraging the widespread availability of street view imagery (SVI). Given a reference image of a target landmark, a Vision Language Model (VLM) is applied to detect the landmark in direction- and zoom-controlled SVI. A successful detection indicates machine-recognised landmark visibility at the corresponding viewpoint. Beyond isolated viewpoints, we construct a heterogeneous visibility graph to represent visual connectivity among landmarks, street-view locations, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
