Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder

Vladimir Iashin; Horace Lee; Dan Schofield; Andrew Zisserman

arXiv:2507.10552·cs.CV·July 15, 2025

Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder

Vladimir Iashin, Horace Lee, Dan Schofield, Andrew Zisserman

PDF

Open Access

TL;DR

This paper presents a self-supervised learning method using Vision Transformers to create a universal chimpanzee face embedder from unlabeled camera trap footage, outperforming supervised methods in re-identification tasks.

Contribution

It introduces a novel self-supervised approach for learning face embeddings from unlabeled wildlife footage, eliminating the need for manual labels and improving re-identification performance.

Findings

01

Outperforms supervised baselines on re-identification benchmarks

02

Uses only unlabeled camera trap footage for training

03

Demonstrates scalability for biodiversity monitoring

Abstract

Camera traps are revolutionising wildlife monitoring by capturing vast amounts of visual data; however, the manual identification of individual animals remains a significant bottleneck. This study introduces a fully self-supervised approach to learning robust chimpanzee face embeddings from unlabeled camera-trap footage. Leveraging the DINOv2 framework, we train Vision Transformers on automatically mined face crops, eliminating the need for identity labels. Our method demonstrates strong open-set re-identification performance, surpassing supervised baselines on challenging benchmarks such as Bossou, despite utilising no labelled data during training. This work underscores the potential of self-supervised learning in biodiversity monitoring and paves the way for scalable, non-invasive population studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Video Surveillance and Tracking Methods · Face and Expression Recognition