AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming
Ahmed Qazi, Taha Razzaq, Asim Iqbal

TL;DR
This paper presents a multimodal vision framework that combines multiple AI models to analyze livestock behavior from videos, enabling non-invasive, detailed monitoring for improved farm management and animal welfare.
Contribution
The paper introduces an integrated multimodal vision framework using GroundingDINO, HQSAM, and ViTPose for comprehensive, non-invasive livestock behavior analysis from video data.
Findings
Accurately detects and segments individual animals in videos.
Provides detailed posture and movement analysis across species.
Applicable to various video resolutions and livestock behaviors.
Abstract
We introduce a multimodal vision framework for precision livestock farming, harnessing the power of GroundingDINO, HQSAM, and ViTPose models. This integrated suite enables comprehensive behavioral analytics from video data without invasive animal tagging. GroundingDINO generates accurate bounding boxes around livestock, while HQSAM segments individual animals within these boxes. ViTPose estimates key body points, facilitating posture and movement analysis. Demonstrated on a sheep dataset with grazing, running, sitting, standing, and walking activities, our framework extracts invaluable insights: activity and grazing patterns, interaction dynamics, and detailed postural evaluations. Applicable across species and video resolutions, this framework revolutionizes non-invasive livestock monitoring for activity detection, counting, health assessments, and posture analyses. It empowers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFood Supply Chain Traceability
