Google Landmark Retrieval 2021 Competition Third Place Solution

Qishen Ha; Bo Liu; Hongwei Zhang

arXiv:2110.04619·cs.CV·October 12, 2021·1 cites

Google Landmark Retrieval 2021 Competition Third Place Solution

Qishen Ha, Bo Liu, Hongwei Zhang

PDF

Open Access

TL;DR

This paper describes the third-place solution for the Google Landmark Retrieval 2021 Challenge, utilizing ensembles of transformers and ConvNets with advanced training techniques to improve landmark retrieval and recognition accuracy.

Contribution

The paper introduces a new ensemble approach combining transformers and ConvNets with Sub-center ArcFace, achieving state-of-the-art results in landmark retrieval and recognition.

Findings

01

Transformers significantly outperform ConvNets in retrieval tasks.

02

Ensemble models improve overall accuracy.

03

Achieved third place in the competition.

Abstract

We present our solutions to the Google Landmark Challenges 2021, for both the retrieval and the recognition tracks. Both solutions are ensembles of transformers and ConvNet models based on Sub-center ArcFace with dynamic margins. Since the two tracks share the same training data, we used the same pipeline and training approach, but with different model selections for the ensemble and different post-processing. The key improvement over last year is newer state-of-the-art vision architectures, especially transformers which significantly outperform ConvNets for the retrieval task. We finished third and fourth places for the retrieval and recognition tracks respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsAdditive Angular Margin Loss