VS-Net: Voting with Segmentation for Visual Localization
Zhaoyang Huang, Han Zhou, Yijin Li, Bangbang Yang, Yan Xu, Xiaowei, Zhou, Hujun Bao, Guofeng Zhang, Hongsheng Li

TL;DR
VS-Net introduces a novel segmentation-based approach with a voting mechanism and a prototype triplet loss to improve visual localization accuracy by establishing reliable 2D-3D correspondences in complex scenes.
Contribution
The paper presents VS-Net, a new framework combining segmentation and voting for scene-specific landmarks, along with an efficient training loss for large label sets.
Findings
Outperforms state-of-the-art methods on multiple benchmarks.
Effectively handles up to 5000 landmarks per scene.
Uses a novel prototype-based triplet loss for efficient training.
Abstract
Visual localization is of great importance in robotics and computer vision. Recently, scene coordinate regression based methods have shown good performance in visual localization in small static scenes. However, it still estimates camera poses from many inferior scene coordinates. To address this problem, we propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks. In the landmark generation stage, the 3D surfaces of the target scene are over-segmented into mosaic patches whose centers are regarded as the scene-specific landmarks. To robustly and accurately recover the scene-specific landmarks, we propose the Voting with Segmentation Network (VS-Net) to segment the pixels into different landmark patches with a segmentation branch and estimate the landmark locations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
MethodsTriplet Loss
