AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis

Enmin Zhong; Carlos R. del-Blanco; Daniel Berj\'on; Fernando; Jaureguizar; Narciso Garc\'ia

arXiv:2505.00569·cs.CV·May 2, 2025

AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis

Enmin Zhong, Carlos R. del-Blanco, Daniel Berj\'on, Fernando, Jaureguizar, Narciso Garc\'ia

PDF

TL;DR

AnimalMotionCLIP enhances animal behavior recognition by integrating motion cues and temporal modeling into the CLIP framework, leading to improved accuracy in recognizing fine temporal actions.

Contribution

This work introduces a novel method that incorporates motion information and various temporal aggregation schemes into CLIP for better animal behavior analysis.

Findings

01

Outperforms state-of-the-art methods on Animal Kingdom dataset

02

Effective integration of optical flow improves behavior recognition

03

Temporal modeling schemes enhance recognition of fine actions

Abstract

Recently, there has been a surge of interest in applying deep learning techniques to animal behavior recognition, particularly leveraging pre-trained visual language models, such as CLIP, due to their remarkable generalization capacity across various downstream tasks. However, adapting these models to the specific domain of animal behavior recognition presents two significant challenges: integrating motion information and devising an effective temporal modeling scheme. In this paper, we propose AnimalMotionCLIP to address these challenges by interleaving video frames and optical flow information in the CLIP framework. Additionally, several temporal modeling schemes using an aggregation of classifiers are proposed and compared: dense, semi dense, and sparse. As a result, fine temporal actions can be correctly recognized, which is of vital importance in animal behavior analysis.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training