SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale   Place Recognition

Zhaoxin Fan; Zhenbo Song; Hongyan Liu; Zhiwu Lu; Jun He; Xiaoyong; Du

arXiv:2105.00149·cs.CV·December 14, 2021·1 cites

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Zhaoxin Fan, Zhenbo Song, Hongyan Liu, Zhiwu Lu, Jun He, Xiaoyong, Du

PDF

Open Access 1 Video

TL;DR

SVT-Net is a super lightweight 3D point cloud model that effectively captures both local and long-range features for large scale place recognition, achieving state-of-the-art accuracy with minimal model size.

Contribution

The paper introduces SVT-Net, a novel lightweight network combining Atom-based and Cluster-based Sparse Voxel Transformers for improved place recognition.

Findings

01

Achieves state-of-the-art accuracy on benchmark datasets.

02

Maintains high speed with a model size of only 0.9M.

03

Simplified versions further reduce size to 0.8M and 0.4M while preserving performance.

Abstract

Point cloud-based large scale place recognition is fundamental for many applications like Simultaneous Localization and Mapping (SLAM). Although many models have been proposed and have achieved good performance by learning short-range local features, long-range contextual properties have often been neglected. Moreover, the model size has also become a bottleneck for their wide applications. To overcome these challenges, we propose a super light-weight network model termed SVT-Net for large scale place recognition. Specifically, on top of the highly efficient 3D Sparse Convolution (SP-Conv), an Atom-based Sparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer (CSVT) are proposed to learn both short-range local features and long-range contextual features in this model. Consisting of ASVT and CSVT, SVT-Net can achieve state-of-the-art on benchmark datasets in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition· underline

Taxonomy

TopicsRobotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage · Indoor and Outdoor Localization Technologies

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Multi-Head Attention · Adam · Layer Normalization · Residual Connection · Label Smoothing · Byte Pair Encoding · Dropout