Transfer Learning with Self-Supervised Vision Transformers for Snake   Identification

Anthony Miyaguchi; Murilo Gustineli; Austin Fischer; and Ryan; Lundqvist

arXiv:2407.06178·cs.CV·July 9, 2024

Transfer Learning with Self-Supervised Vision Transformers for Snake Identification

Anthony Miyaguchi, Murilo Gustineli, Austin Fischer, and Ryan, Lundqvist

PDF

Open Access 1 Repo

TL;DR

This paper explores using self-supervised Vision Transformers, specifically DINOv2, for snake species identification from images, demonstrating promising results in a large dataset through embedding analysis and linear classification.

Contribution

The study applies DINOv2 vision transformer embeddings to snake identification, showcasing their effectiveness and providing insights through embedding analysis in a large-scale dataset.

Findings

01

Achieved a score of 39.69 in the competition

02

Demonstrated the potential of DINOv2 embeddings for species classification

03

Provided analysis of embedding structure and its relation to classification performance

Abstract

We present our approach for the SnakeCLEF 2024 competition to predict snake species from images. We explore and use Meta's DINOv2 vision transformer model for feature extraction to tackle species' high variability and visual similarity in a dataset of 182,261 images. We perform exploratory analysis on embeddings to understand their structure, and train a linear classifier on the embeddings to predict species. Despite achieving a score of 39.69, our results show promise for DINOv2 embeddings in snake identification. All code for this project is available at https://github.com/dsgt-kaggle-clef/snakeclef-2024.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dsgt-kaggle-clef/snakeclef-2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Digital Imaging for Blood Diseases · Robot Manipulation and Learning

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Multi-Head Attention · Vision Transformer