Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics

Haiyu Yang; Miel Hostens

arXiv:2604.27128·cs.CV·May 1, 2026

Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics

Haiyu Yang, Miel Hostens

PDF

TL;DR

This paper presents a lightweight, distilled vision model pipeline for edge-deployable livestock monitoring that maintains high accuracy while significantly reducing memory and computational requirements.

Contribution

It introduces a novel distillation approach for SAM 3 and DINOv3 models, enabling efficient on-device livestock monitoring and visual analytics.

Findings

01

Achieves 92.29% MOTA on Edinburgh Pig dataset

02

Reduces VRAM usage by 3-fold, enabling deployment on NVIDIA Jetson Orin NX

03

Maintains high classification accuracy of 97.34% top-1

Abstract

Foundation-model pipelines for individual-level livestock monitoring -- combining open-vocabulary detection, promptable video segmentation, and self-supervised visual embeddings -- have raised the accuracy ceiling of precision livestock farming (PLF), but their GPU memory budgets exceed the envelope of commodity edge accelerators. To close this gap, the 446M-parameter Perception Encoder (PE-ViT-L+) backbone of SAM 3 is distilled into a 40.66M-parameter multi-scale student through three mechanisms: a Feature Pyramid Network student encoder built on TinyViT-21M-512, a four-term direction-then-scale distillation loss, and backbone-substitution inference with sliding-window session pruning that bounds streaming GPU memory growth. The DINOv3 family includes a pre-distilled ViT-S/16 variant (21.6M parameters) released alongside a 6716M-parameter ViT-7B teacher; the ViT-S (21M) variant is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.