A Novel Image Descriptor with Aggregated Semantic Skeleton Representation for Long-term Visual Place Recognition
Nie Jiwei, Feng Joe-Mei, Xue Dingyu, Pan Feng, Liu Wei and, Hu Jun, Cheng Shuai

TL;DR
This paper introduces SSR-VLAD, a new image descriptor leveraging aggregated semantic skeletons for robust visual place recognition in urban scenes with appearance variations, improving accuracy over existing methods.
Contribution
The paper proposes SSR-VLAD, a novel semantic skeleton-based image descriptor that encodes spatial and semantic information for enhanced long-term visual place recognition.
Findings
SSR-VLAD outperforms state-of-the-art VPR methods in urban scene datasets.
SSR-VLAD maintains real-time performance during matching.
Semantic skeleton features improve robustness against appearance changes.
Abstract
In a Simultaneous Localization and Mapping (SLAM) system, a loop-closure can eliminate accumulated errors, which is accomplished by Visual Place Recognition (VPR), a task that retrieves the current scene from a set of pre-stored sequential images through matching specific scene-descriptors. In urban scenes, the appearance variation caused by seasons and illumination has brought great challenges to the robustness of scene descriptors. Semantic segmentation images can not only deliver the shape information of objects but also their categories and spatial relations that will not be affected by the appearance variation of the scene. Innovated by the Vector of Locally Aggregated Descriptor (VLAD), in this paper, we propose a novel image descriptor with aggregated semantic skeleton representation (SSR), dubbed SSR-VLAD, for the VPR under drastic appearance-variation of environments. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
