WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift
Julian D. Santamaria, Claudia Isaza, Jhony H. Giraldo

TL;DR
WildIng is a novel wildlife image representation model that combines text descriptions with image features to improve the generalization of wildlife identification models across different geographical regions, addressing domain shift issues.
Contribution
WildIng introduces a new approach that integrates textual descriptions with image features to enhance robustness against geographical domain shifts in wildlife monitoring.
Findings
WildIng improves accuracy of foundation models by 30% under domain shift.
Experiments on datasets from America and Africa demonstrate enhanced generalization.
WildIng outperforms existing models in cross-region wildlife identification.
Abstract
Wildlife monitoring is crucial for studying biodiversity loss and climate change. Camera trap images provide a non-intrusive method for analyzing animal populations and identifying ecological patterns over time. However, manual analysis is time-consuming and resource-intensive. Deep learning, particularly foundation models, has been applied to automate wildlife identification, achieving strong performance when tested on data from the same geographical locations as their training sets. Yet, despite their promise, these models struggle to generalize to new geographical areas, leading to significant performance drops. For example, training an advanced vision-language model, such as CLIP with an adapter, on an African dataset achieves an accuracy of 84.77%. However, this performance drops significantly to 16.17% when the model is tested on an American dataset. This limitation partly arises…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
