Learning When to Look: On-Demand Keypoint-Video Fusion for Animal Behavior Analysis
Weihan Li, Jingyang Ke, Yule Wang, Chengrui Li, Anqi Wu

TL;DR
LookAgain is a multimodal framework that intelligently combines keypoints and video data for animal behavior analysis, reducing computational costs while maintaining high accuracy by activating visual processing only when necessary.
Contribution
It introduces an on-demand visual grounding method that selectively processes video frames based on keypoint ambiguity, improving efficiency in long-duration animal behavior recordings.
Findings
Achieves high accuracy with fewer processed frames
Reduces computational cost significantly
Effective for both single- and multi-animal datasets
Abstract
Understanding animal behavior from video is essential for neuroscience research. Modern laboratories typically collect two complementary data streams: skeletal keypoints from pose estimation tools and raw video recordings. Keypoint-based methods are efficient but suffer from geometric ambiguity, environmental blindness, and sensitivity to occlusions. Video-based methods capture rich context but require processing every frame, making them impractical for the hundreds of hours of recordings that modern experiments produce. We introduce LookAgain, a multimodal framework that combines the efficiency of keypoints with the representational power of video through on-demand visual grounding. During training, LookAgain uses dense visual features to pretrain a motion encoder and to train a gating module that learns which frames require visual context. During inference, this gating module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsZebrafish Biomedical Research Applications · Human Pose and Action Recognition · Human Motion and Animation
