Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu,, Andr\'es Hern\'andez, Andr\'es Montes-Rojas, Rafael Escucha, Laura Siabatto,, Andr\'es Link, Pablo Arbel\'aez, Rahul Dodhia, Juan Lavista Ferres

TL;DR
WildMatch is a zero-shot wildlife species recognition framework that uses multimodal foundation models and instruction tuning to identify animals in camera trap images without requiring labeled training data.
Contribution
The paper introduces WildMatch, a novel zero-shot classification method leveraging multimodal models and knowledge augmentation for wildlife monitoring.
Findings
Effective zero-shot species recognition demonstrated on Colombian camera trap data
Instruction tuning improves detailed animal description generation
Knowledge augmentation enhances caption quality and classification accuracy
Abstract
Due to deteriorating environmental conditions and increasing human activity, conservation efforts directed towards wildlife is crucial. Motion-activated camera traps constitute an efficient tool for tracking and monitoring wildlife populations across the globe. Supervised learning techniques have been successfully deployed to analyze such imagery, however training such techniques requires annotations from experts. Reducing the reliance on costly labelled data therefore has immense potential in developing large-scale wildlife tracking solutions with markedly less human labor. In this work we propose WildMatch, a novel zero-shot species classification framework that leverages multimodal foundation models. In particular, we instruction tune vision-language models to generate detailed visual descriptions of camera trap images using similar terminology to experts. Then, we match the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsBalanced Selection
