VS-Net: Voting with Segmentation for Visual Localization

Zhaoyang Huang; Han Zhou; Yijin Li; Bangbang Yang; Yan Xu; Xiaowei; Zhou; Hujun Bao; Guofeng Zhang; Hongsheng Li

arXiv:2105.10886·cs.CV·May 25, 2021

VS-Net: Voting with Segmentation for Visual Localization

Zhaoyang Huang, Han Zhou, Yijin Li, Bangbang Yang, Yan Xu, Xiaowei, Zhou, Hujun Bao, Guofeng Zhang, Hongsheng Li

PDF

Open Access 1 Repo

TL;DR

VS-Net introduces a novel segmentation-based approach with a voting mechanism and a prototype triplet loss to improve visual localization accuracy by establishing reliable 2D-3D correspondences in complex scenes.

Contribution

The paper presents VS-Net, a new framework combining segmentation and voting for scene-specific landmarks, along with an efficient training loss for large label sets.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks.

02

Effectively handles up to 5000 landmarks per scene.

03

Uses a novel prototype-based triplet loss for efficient training.

Abstract

Visual localization is of great importance in robotics and computer vision. Recently, scene coordinate regression based methods have shown good performance in visual localization in small static scenes. However, it still estimates camera poses from many inferior scene coordinates. To address this problem, we propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks. In the landmark generation stage, the 3D surfaces of the target scene are over-segmented into mosaic patches whose centers are regarded as the scene-specific landmarks. To robustly and accurately recover the scene-specific landmarks, we propose the Voting with Segmentation Network (VS-Net) to segment the pixels into different landmark patches with a segmentation branch and estimate the landmark locations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zju3dv/VS-Net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging

MethodsTriplet Loss