Loading paper
How Well Do Vision--Language Models Understand Cities? A Comparative Study on Spatial Reasoning from Street-View Images | Tomesphere