MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark
Yiwei Ou, Xiaobin Ren, Ronggui Sun, Guansong Gao, Kaiqi Zhao, Manfredo Manfredini

TL;DR
MMS-VPR introduces a comprehensive multimodal street-level dataset and benchmark for visual place recognition, emphasizing pedestrian environments, diverse modalities, and long-term, day-night, and multi-angle data in a non-Western urban setting.
Contribution
The paper presents MMS-VPR, a large-scale multimodal dataset and benchmarking platform specifically designed for pedestrian street-level visual place recognition, addressing limitations of existing vehicle-centric datasets.
Findings
Provides 110,529 images and 2,527 video clips across 208 locations.
Includes multimodal annotations like GPS, timestamps, and textual metadata.
Offers a standardized benchmarking platform for multimodal VPR methods.
Abstract
Existing visual place recognition (VPR) datasets predominantly rely on vehicle-mounted imagery, offer limited multimodal diversity, and underrepresent dense pedestrian street scenes, particularly in non-Western urban contexts. We introduce MMS-VPR, a large-scale multimodal dataset for street-level place recognition in pedestrian-only environments. MMS-VPR comprises 110,529 images and 2,527 video clips across 208 locations in a ~70,800 open-air commercial district in Chengdu, China. Field data were collected in 2024, while social media data span seven years (2019-2025), providing both fine-grained temporal granularity and long-term temporal coverage. Each location features comprehensive day-night coverage, multiple viewing angles, and multimodal annotations including GPS coordinates, timestamps, and semantic textual metadata. We further release MMS-VPRlib, a unified benchmarking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
MethodsGreedy Policy Search
