Great Ape Detection in Challenging Jungle Camera Trap Footage via   Attention-Based Spatial and Temporal Feature Blending

Xinyu Yang; Majid Mirmehdi; Tilo Burghardt

arXiv:1908.11240·cs.CV·August 30, 2019·1 cites

Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending

Xinyu Yang, Majid Mirmehdi, Tilo Burghardt

PDF

Open Access

TL;DR

This paper introduces a novel multi-frame video detection framework with attention-based feature blending for identifying great apes in challenging jungle camera trap footage, significantly improving detection robustness.

Contribution

The paper presents the first multi-frame detection method incorporating self-attention for spatial and temporal feature blending in wildlife monitoring.

Findings

01

Outperforms frame-based detectors in challenging conditions

02

Achieves high robustness on real-world camera trap data

03

Demonstrates effectiveness on large-scale annotated datasets

Abstract

We propose the first multi-frame video object detection framework trained to detect great apes. It is applicable to challenging camera trap footage in complex jungle environments and extends a traditional feature pyramid architecture by adding self-attention driven feature blending in both the spatial as well as the temporal domain. We demonstrate that this extension can detect distinctive species appearance and motion signatures despite significant partial occlusion. We evaluate the framework using 500 camera trap videos of great apes from the Pan African Programme containing 180K frames, which we manually annotated with accurate per-frame animal bounding boxes. These clips contain significant partial occlusions, challenging lighting, dynamic backgrounds, and natural camouflage effects. We show that our approach performs highly robustly and significantly outperforms frame-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning