Learning When to Look: On-Demand Keypoint-Video Fusion for Animal Behavior Analysis

Weihan Li; Jingyang Ke; Yule Wang; Chengrui Li; Anqi Wu

arXiv:2603.07279·q-bio.QM·March 10, 2026

Learning When to Look: On-Demand Keypoint-Video Fusion for Animal Behavior Analysis

Weihan Li, Jingyang Ke, Yule Wang, Chengrui Li, Anqi Wu

PDF

Open Access

TL;DR

LookAgain is a multimodal framework that intelligently combines keypoints and video data for animal behavior analysis, reducing computational costs while maintaining high accuracy by activating visual processing only when necessary.

Contribution

It introduces an on-demand visual grounding method that selectively processes video frames based on keypoint ambiguity, improving efficiency in long-duration animal behavior recordings.

Findings

01

Achieves high accuracy with fewer processed frames

02

Reduces computational cost significantly

03

Effective for both single- and multi-animal datasets

Abstract

Understanding animal behavior from video is essential for neuroscience research. Modern laboratories typically collect two complementary data streams: skeletal keypoints from pose estimation tools and raw video recordings. Keypoint-based methods are efficient but suffer from geometric ambiguity, environmental blindness, and sensitivity to occlusions. Video-based methods capture rich context but require processing every frame, making them impractical for the hundreds of hours of recordings that modern experiments produce. We introduce LookAgain, a multimodal framework that combines the efficiency of keypoints with the representational power of video through on-demand visual grounding. During training, LookAgain uses dense visual features to pretrain a motion encoder and to train a gating module that learns which frames require visual context. During inference, this gating module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsZebrafish Biomedical Research Applications · Human Pose and Action Recognition · Human Motion and Animation