6th Place Solution to Google Universal Image Embedding

S. Gkelios; A. Kastellos; S. Chatzichristofis

arXiv:2210.09377·cs.CV·October 19, 2022·1 cites

6th Place Solution to Google Universal Image Embedding

S. Gkelios, A. Kastellos, S. Chatzichristofis

PDF

Open Access

TL;DR

This paper describes a competitive image embedding solution using CLIP architecture, SubCenter ArcFace loss, and a custom dataset, achieving a high score in a Kaggle challenge.

Contribution

The paper introduces a novel combination of CLIP, SubCenter ArcFace loss, and dataset creation for improved image embedding performance.

Findings

01

Achieved a score of 0.685 on the private leaderboard

02

Utilized CLIP architecture for visual representation

03

Enhanced transfer learning with a tailored training scheme

Abstract

This paper presents the 6th place solution to the Google Universal Image Embedding competition on Kaggle. Our approach is based on the CLIP architecture, a powerful pre-trained model used to learn visual representation from natural language supervision. We also utilized the SubCenter ArcFace loss with dynamic margins to improve the distinctive power of class separability and embeddings. Finally, a diverse dataset has been created based on the test's set categories and the leaderboard's feedback. By carefully crafting a training scheme to enhance transfer learning, our submission scored 0.685 on the private leaderboard.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsContrastive Language-Image Pre-training · Additive Angular Margin Loss