3rd Place Solution for Google Universal Image Embedding

Nobuaki Aoki; Yasumasa Namba

arXiv:2210.09296·cs.CV·October 18, 2022·1 cites

3rd Place Solution for Google Universal Image Embedding

Nobuaki Aoki, Yasumasa Namba

PDF

Open Access

TL;DR

This paper describes a competitive image embedding solution using ViT-H/14 from OpenCLIP with a two-stage training process, achieving high precision in a Kaggle competition.

Contribution

The paper introduces a novel two-stage training approach with ViT-H/14 backbone for image embedding, achieving third place in a Kaggle competition.

Findings

01

Achieved 0.692 mean Precision @5 on private leaderboard

02

Utilized ViT-H/14 from OpenCLIP as backbone

03

Implemented a two-stage training process

Abstract

This paper presents the 3rd place solution to the Google Universal Image Embedding Competition on Kaggle. We use ViT-H/14 from OpenCLIP for the backbone of ArcFace, and trained in 2 stage. 1st stage is done with freezed backbone, and 2nd stage is whole model training. We achieve 0.692 mean Precision @5 on private leaderboard. Code available at https://github.com/YasumasaNamba/google-universal-image-embedding

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification

MethodsAdditive Angular Margin Loss