CATALOG: A Camera Trap Language-guided Contrastive Learning Model
Julian D. Santamaria, Claudia Isaza, Jhony H. Giraldo

TL;DR
The paper introduces CATALOG, a contrastive learning model that combines foundation models and multi-modal fusion to improve camera-trap image recognition across different domains, addressing challenges like lighting, camouflage, and occlusions.
Contribution
It proposes a novel framework that leverages multiple foundation models with contrastive learning and multi-modal fusion for robust camera-trap image recognition.
Findings
Outperforms previous methods on benchmark datasets.
Effective in recognizing animals across different geographical areas.
Handles domain shifts due to lighting, camouflage, and occlusions.
Abstract
Foundation Models (FMs) have been successful in various computer vision tasks like image classification, object detection and image segmentation. However, these tasks remain challenging when these models are tested on datasets with different distributions from the training dataset, a problem known as domain shift. This is especially problematic for recognizing animal species in camera-trap images where we have variability in factors like lighting, camouflage and occlusions. In this paper, we propose the Camera Trap Language-guided Contrastive Learning (CATALOG) model to address these issues. Our approach combines multiple FMs to extract visual and textual features from camera-trap data and uses a contrastive loss function to train the model. We evaluate CATALOG on two benchmark datasets and show that it outperforms previous state-of-the-art methods in camera-trap image recognition,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Tools and Methods
MethodsContrastive Learning
