Count, Crop and Recognise: Fine-Grained Recognition in the Wild
Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman

TL;DR
This paper presents a multistage CNN-based approach for fine-grained animal recognition in videos, capable of labeling individuals without visible faces, and introduces a new chimpanzee dataset and visualization techniques.
Contribution
It introduces the CCR multistage recognition process, compares frame-based and track-based labeling, and provides a new wild chimpanzee dataset with feature visualization.
Findings
CCR improves recognition performance significantly
Frame-based labeling outperforms track-based methods
New chimpanzee dataset enables wild recognition research
Abstract
The goal of this paper is to label all the animal individuals present in every frame of a video. Unlike previous methods that have principally concentrated on labelling face tracks, we aim to label individuals even when their faces are not visible. We make the following contributions: (i) we introduce a 'Count, Crop and Recognise' (CCR) multistage recognition process for frame level labelling. The Count and Recognise stages involve specialised CNNs for the task, and we show that this simple staging gives a substantial boost in performance; (ii) we compare the recall using frame based labelling to both face and body track based labelling, and demonstrate the advantage of frame based with CCR for the specified goal; (iii) we introduce a new dataset for chimpanzee recognition in the wild; and (iv) we apply a high-granularity visualisation technique to further understand the learned CNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
