DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding

Weihao Xuan; Junjue Wang; Heli Qi; Zihang Chen; Zhuo Zheng; Yanfei Zhong; Junshi Xia; Naoto Yokoya

arXiv:2505.21076·cs.CV·October 28, 2025

DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding

Weihao Xuan, Junjue Wang, Heli Qi, Zihang Chen, Zhuo Zheng, Yanfei Zhong, Junshi Xia, Naoto Yokoya

PDF

Open Access 1 Datasets

TL;DR

This paper introduces DVL-Suite, a comprehensive framework and benchmark for evaluating and enhancing multimodal large language models in long-term urban analysis using multi-temporal remote sensing imagery.

Contribution

It provides a large-scale dataset, diverse urban understanding tasks, and a baseline model to improve long-term city dynamics analysis with multimodal models.

Findings

01

State-of-the-art models show limitations in long-term temporal understanding.

02

The DVL-Instruct dataset improves model capabilities in multi-temporal analysis.

03

DVLChat enables integrated question-answering and segmentation for urban insights.

Abstract

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in visual understanding, but their application to long-term Earth observation analysis remains limited, primarily focusing on single-temporal or bi-temporal imagery. To address this gap, we introduce DVL-Suite, a comprehensive framework for analyzing long-term urban dynamics through remote sensing imagery. Our suite comprises 14,871 high-resolution (1.0m) multi-temporal images spanning 42 major cities in the U.S. from 2005 to 2023, organized into two components: DVL-Bench and DVL-Instruct. The DVL-Bench includes six urban understanding tasks, from fundamental change detection (pixel-level) to quantitative analyses (regional-level) and comprehensive urban narratives (scene-level), capturing diverse urban dynamics including expansion/transformation patterns, disaster assessment, and environmental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

weihao1115/dvl_suite
dataset· 57 dl
57 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Mobility and Location-Based Analysis · Geographic Information Systems Studies