AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs
Huatao Xu, Zihe Liu, Zilin Zeng, Baichuan Li, Mo Li

TL;DR
AutoTour is a system that automatically generates detailed landmark annotations and narratives for user photos by combining visual features with open geospatial data, enabling scalable, context-aware photo-guided exploration.
Contribution
AutoTour introduces a training-free pipeline that fuses visual and geospatial data with LLM-generated descriptions for scalable, context-aware photo tours using open data sources.
Findings
Provides rich, interpretable annotations for landmarks
Enables interactive, context-aware exploration
Bridges visual perception with geospatial understanding
Abstract
We present AutoTour, a system that enhances user exploration by automatically generating fine-grained landmark annotations and descriptive narratives for photos captured by users. The key idea of AutoTour is to fuse visual features extracted from photos with nearby geospatial features queried from open matching databases. Unlike existing tour applications that rely on pre-defined content or proprietary datasets, AutoTour leverages open and extensible data sources to provide scalable and context-aware photo-based guidance. To achieve this, we design a training-free pipeline that first extracts and filters relevant geospatial features around the user's GPS location. It then detects major landmarks in user photos through VLM-based feature detection and projects them into the horizontal spatial plane. A geometric matching algorithm aligns photo features with corresponding geospatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Face Recognition and Perception
