Transfer Learning with Self-Supervised Vision Transformers for Snake Identification
Anthony Miyaguchi, Murilo Gustineli, Austin Fischer, and Ryan, Lundqvist

TL;DR
This paper explores using self-supervised Vision Transformers, specifically DINOv2, for snake species identification from images, demonstrating promising results in a large dataset through embedding analysis and linear classification.
Contribution
The study applies DINOv2 vision transformer embeddings to snake identification, showcasing their effectiveness and providing insights through embedding analysis in a large-scale dataset.
Findings
Achieved a score of 39.69 in the competition
Demonstrated the potential of DINOv2 embeddings for species classification
Provided analysis of embedding structure and its relation to classification performance
Abstract
We present our approach for the SnakeCLEF 2024 competition to predict snake species from images. We explore and use Meta's DINOv2 vision transformer model for feature extraction to tackle species' high variability and visual similarity in a dataset of 182,261 images. We perform exploratory analysis on embeddings to understand their structure, and train a linear classifier on the embeddings to predict species. Despite achieving a score of 39.69, our results show promise for DINOv2 embeddings in snake identification. All code for this project is available at https://github.com/dsgt-kaggle-clef/snakeclef-2024.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Digital Imaging for Blood Diseases · Robot Manipulation and Learning
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Multi-Head Attention · Vision Transformer
