2nd Place Solution to Facebook AI Image Similarity Challenge Matching Track
SeungKee Jeon

TL;DR
This paper describes a high-performing image similarity solution using self-supervised learning and Vision Transformers, achieving second place in a Facebook AI challenge by concatenating image pairs for direct prediction.
Contribution
The novel approach of concatenating query and reference images and training ViT to predict their relationship is a key innovation in image similarity tasks.
Findings
Achieved 0.8291 Micro-average Precision on private leaderboard
Utilized self-supervised learning with Vision Transformer for image matching
Outperformed many existing methods in the challenge
Abstract
This paper presents the 2nd place solution to the Facebook AI Image Similarity Challenge : Matching Track on DrivenData. The solution is based on self-supervised learning, and Vision Transformer(ViT). The main breaktrough comes from concatenating query and reference image to form as one image and asking ViT to directly predict from the image if query image used reference image. The solution scored 0.8291 Micro-average Precision on the private leaderboard.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
