# A Multi-Scale Vision–Sensor Collaborative Framework for Small-Target Insect Pest Management

**Authors:** Chongyu Wang, Yicheng Chen, Shangshan Chen, Ranran Chen, Ziqi Xia, Ruoyu Hu, Yihong Song

PMC · DOI: 10.3390/insects17030281 · 2026-03-04

## TL;DR

This paper introduces a new framework combining vision and environmental sensors to accurately detect small insect pests in agriculture, improving pest management.

## Contribution

A novel multi-scale vision–sensor collaborative framework that integrates visual and environmental data for robust small-target pest recognition.

## Key findings

- The proposed method achieves 93.1% accuracy and 91.6% F1-score on a real-world multimodal pest dataset.
- Environmental factors like temperature and humidity significantly improve pest classification robustness.
- The framework outperforms traditional machine learning and single-modality deep learning approaches.

## Abstract

Small-target insect pests pose a major challenge to intelligent agricultural monitoring due to their tiny size, complex backgrounds, and strong dependence on environmental conditions. To address these issues, this study proposes a multi-scale vision–sensor collaborative framework that integrates visual imagery with environmental sensing data for accurate pest recognition. The model first captures fine-grained pest features through multi-scale visual representation learning, and then introduces environmental factors—such as temperature, humidity, and illumination—as prior information to guide feature discrimination. A collaborative fusion mechanism is further designed to enhance cross-modal consistency and improve classification robustness. Experiments conducted on a real multimodal pest dataset collected from farmland and greenhouse environments demonstrate that the proposed method achieves 93.1% accuracy, 92.0% precision, 91.2% recall, and a 91.6% F1-score, outperforming conventional machine learning and single-modality deep learning approaches. These results indicate that integrating ecological sensing with computer vision can provide a reliable technical pathway for early pest detection and precision agricultural management.

In complex agricultural production environments, small-target pests—characterized by tiny scales, strong background confusion, and close dependence on environmental conditions—pose major challenges to precise monitoring and green pest control. To facilitate the transition from experience-driven to data-driven pest management, a multi-scale vision–sensor collaborative recognition method is proposed for field and protected agriculture scenarios to improve the accuracy and stability of small-target pest recognition under complex conditions. The method jointly models multi-scale visual representations and pest ecological mechanisms: a multi-scale visual feature module enhances fine-grained texture and morphological cues of small targets in deep networks, alleviating feature sparsity and scale mismatch, while environmental sensor data, including temperature, humidity, and illumination, are introduced as priors to modulate visual features and explicitly incorporate ecological constraints into the discrimination process. Stable multimodal fusion and pest category prediction are then achieved through a vision–sensor collaborative discrimination module. Experiments on a multimodal dataset collected from real farmland and greenhouse environments in Linhe District, Bayannur City, Inner Mongolia, demonstrate that the proposed method achieves approximately 93.1% accuracy, 92.0% precision, 91.2% recall, and a 91.6% F1-score on the test set, significantly outperforming traditional machine learning approaches, single-scale deep learning models, and multi-scale vision baselines without environmental priors. Category-level evaluations show balanced performance across multiple small-target pests, including aphids, thrips, whiteflies, leafhoppers, spider mites, and leaf beetles, while ablation studies confirm the critical contributions of multi-scale visual modeling, environmental prior modulation, and vision–sensor collaborative discrimination.

## Linked entities

- **Species:** Thrips (taxon 45057)

## Full-text entities

- **Species:** Tetranychidae (spider mites, family) [taxon 32262], Aphidomorpha (aphids, infraorder) [taxon 33380], Chrysomelidae (leaf beetles, family) [taxon 27439], Cicadellidae (leafhoppers, family) [taxon 30102]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13027269/full.md

---
Source: https://tomesphere.com/paper/PMC13027269