VM-RTDETR: Advancing DETR with Vision State-Space Duality and Multi-Scale Fusion for Robust Pig Detection
Wangli Hao, Shu-Ai Xu, Hao Shu, Hanwei Li, Meng Han, Fuzhong Li, Yanhong Liu

TL;DR
This paper introduces VM-RTDETR, a new object detection model that improves pig detection in farming by combining global and local image features.
Contribution
VM-RTDETR introduces a Vision State-Space Duality backbone and a Multi-Scale Encoder for robust pig detection in complex environments.
Findings
VM-RTDETR outperforms RT-DETR by up to 2.35% in average precision on a pig farm dataset.
The model effectively handles scale changes, occlusions, and complex backgrounds in livestock monitoring.
The VSSD and M-Encoder combination achieves more comprehensive feature representation for detection.
Abstract
Robust pig detection in complex farming environments requires a unified representation of both global semantics and local details, which remains a challenge. This paper proposes VM-RTDETR, an enhanced RT-DETR (transformer-based real-time object detector) model that addresses this by synergizing a Vision State-Space Duality (VSSD) backbone with a Multi-scale Encoder (M-Encoder). The VSSD module breaks through the causal constraints of traditional state-space models (efficiently capturing long-range dependencies and global context within an image) to capture long-range dependencies and global context, while the M-Encoder extracts parallel multi-scale features to handle appearance variations. This collaboration yields a detector that robustly handles scale changes, occlusions, and complex backgrounds. On challenging datasets, VM-RTDETR elevates the state of the art, surpassing strong…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Behavior and Welfare Studies · Advanced Neural Network Applications · Wildlife Ecology and Conservation
