Towards Zero-Shot Camera Trap Image Categorization

Ji\v{r}\'i Vysko\v{c}il; Lukas Picek

arXiv:2410.12769·cs.CV·October 17, 2024·3 cites

Towards Zero-Shot Camera Trap Image Categorization

Ji\v{r}\'i Vysko\v{c}il, Lukas Picek

PDF

Open Access

TL;DR

This paper evaluates various methods for automatic camera trap image categorization, demonstrating that combining detectors with classifiers and zero-shot models can significantly improve accuracy and reduce location-specific overfitting.

Contribution

It introduces a zero-shot categorization pipeline using large models like DINOv2 and FAISS, showing competitive results and potential for scalable wildlife monitoring.

Findings

01

Combining MegaDetector with two classifiers reduces error by up to 75%.

02

Background removal halves error in new locations.

03

Zero-shot pipeline with DINOv2 achieves near state-of-the-art accuracy.

Abstract

This paper describes the search for an alternative approach to the automatic categorization of camera trap images. First, we benchmark state-of-the-art classifiers using a single model for all images. Next, we evaluate methods combining MegaDetector with one or more classifiers and Segment Anything to assess their impact on reducing location-specific overfitting. Last, we propose and test two approaches using large language and foundational models, such as DINOv2, BioCLIP, BLIP, and ChatGPT, in a zero-shot scenario. Evaluation carried out on two publicly available datasets (WCT from New Zealand, CCT20 from the Southwestern US) and a private dataset (CEF from Central Europe) revealed that combining MegaDetector with two separate classifiers achieves the highest accuracy. This approach reduced the relative error of a single BEiTV2 classifier by approximately 42\% on CCT20, 48\% on CEF,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Advanced Image Processing Techniques · Advanced Optical Sensing Technologies

MethodsBLIP: Bootstrapping Language-Image Pre-training