Text2Graph VPR: A Text-to-Graph Expert System for Explainable Place Recognition in Changing Environments
Saeideh Yousefzadeh, Hamidreza Pourreza

TL;DR
Text2Graph VPR introduces an explainable, graph-based visual place recognition system that converts images into semantic scene graphs for robust, interpretable localization across changing environments.
Contribution
The paper presents a novel hybrid approach combining learned semantic embeddings and structural graph matching for explainable place recognition.
Findings
Robust retrieval under severe appearance changes.
Zero-shot operation with human textual queries.
Enhanced transparency and diagnostic capability.
Abstract
Visual Place Recognition (VPR) in long-term deployment requires reasoning beyond pixel similarity: systems must make transparent, interpretable decisions that remain robust under lighting, weather and seasonal change. We present Text2Graph VPR, an explainable semantic localization system that converts image sequences into textual scene descriptions, parses those descriptions into structured scene graphs, and reasons over the resulting graphs to identify places. Scene graphs capture objects, attributes and pairwise relations; we aggregate per-frame graphs into a compact place representation and perform retrieval with a dual-similarity mechanism that fuses learned Graph Attention Network (GAT) embeddings and a Shortest-Path (SP) kernel for structural matching. This hybrid design enables both learned semantic matching and topology-aware comparison, and -- critically -- produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
