Google Landmark Retrieval 2021 Competition Third Place Solution
Qishen Ha, Bo Liu, Hongwei Zhang

TL;DR
This paper describes the third-place solution for the Google Landmark Retrieval 2021 Challenge, utilizing ensembles of transformers and ConvNets with advanced training techniques to improve landmark retrieval and recognition accuracy.
Contribution
The paper introduces a new ensemble approach combining transformers and ConvNets with Sub-center ArcFace, achieving state-of-the-art results in landmark retrieval and recognition.
Findings
Transformers significantly outperform ConvNets in retrieval tasks.
Ensemble models improve overall accuracy.
Achieved third place in the competition.
Abstract
We present our solutions to the Google Landmark Challenges 2021, for both the retrieval and the recognition tracks. Both solutions are ensembles of transformers and ConvNet models based on Sub-center ArcFace with dynamic margins. Since the two tracks share the same training data, we used the same pipeline and training approach, but with different model selections for the ensemble and different post-processing. The key improvement over last year is newer state-of-the-art vision architectures, especially transformers which significantly outperform ConvNets for the retrieval task. We finished third and fourth places for the retrieval and recognition tracks respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsAdditive Angular Margin Loss
