Automated Segmentation and Tracking of Group Housed Pigs Using Foundation Models
Ye Bi, Bimala Acharya, David Rosero, Juan Steibel

TL;DR
This paper introduces a foundation model-based workflow for automated, scalable, and label-efficient monitoring of group-housed pigs, integrating detection, segmentation, and long-term tracking to improve precision livestock farming.
Contribution
It presents a novel FM-centered pipeline combining pretrained vision-language models with modular post-processing for pig monitoring, reducing reliance on extensive labeled data.
Findings
Over 80% of active tracks were fully correct after post-processing.
The system achieved a mean region similarity (J) of 0.83 and MOTA of 0.99.
The approach maintained stable identities over 132-minute videos without switches.
Abstract
Foundation models (FM) are reshaping computer vision by reducing reliance on task-specific supervised learning and leveraging general visual representations learned at scale. In precision livestock farming, most pipelines remain dominated by supervised learning models that require extensive labeled data, repeated retraining, and farm-specific tuning. This study presents an FM-centered workflow for automated monitoring of group-housed nursery pigs, in which pretrained vision-language FM serve as general visual backbones and farm-specific adaptation is achieved through modular post-processing. Grounding-DINO was first applied to 1,418 annotated images to establish a baseline detection performance. While detection accuracy was high under daytime conditions, performance degraded under night-vision and heavy occlusion, motivating the integration of temporal tracking logic. Building on these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
