# Tree-Guided Transformer for Sensor-Based Ecological Image Feature Extraction and Multitarget Recognition in Agricultural Systems

**Authors:** Yiqiang Sun, Zigang Huang, Linfeng Yang, Zihuan Wang, Mingzhuo Ruan, Jingchao Suo, Shuo Yan

PMC · DOI: 10.3390/s25196206 · 2025-10-07

## TL;DR

This paper introduces a new AI framework that improves image recognition in farmland ecosystems by using ecological knowledge and sensor data.

## Contribution

The novel tree-guided Transformer with a knowledge-augmented co-attention mechanism enhances ecological image feature extraction and multitarget recognition.

## Key findings

- The framework achieves 90.4% precision, 86.7% recall, and 88.5% F1-score in image classification.
- It attains 91.6% precision and 86.3% mAP@50 in detection tasks with 80.5% co-occurrence accuracy.
- Hierarchical reasoning and knowledge-enhanced tasks reach F1-scores of 88.5% and 89.7%.

## Abstract

Farmland ecosystems present complex pest–predator co-occurrence patterns, posing significant challenges for image-based multitarget recognition and ecological modeling in sensor-driven computer vision tasks. To address these issues, this study introduces a tree-guided Transformer framework enhanced with a knowledge-augmented co-attention mechanism, enabling effective feature extraction from sensor-acquired images. A hierarchical ecological taxonomy (Phylum–Family Species) guides prompt-driven semantic reasoning, while an ecological knowledge graph enriches visual representations by embedding co-occurrence priors. A multimodal dataset containing 60 pest and predator categories with annotated images and semantic descriptions was constructed for evaluation. Experimental results demonstrate that the proposed method achieves 90.4% precision, 86.7% recall, and 88.5% F1-score in image classification, along with 82.3% hierarchical accuracy. In detection tasks, it attains 91.6% precision and 86.3% mAP@50, with 80.5% co-occurrence accuracy. For hierarchical reasoning and knowledge-enhanced tasks, F1-scores reach 88.5% and 89.7%, respectively. These results highlight the framework’s strong capability in extracting structured, semantically aligned image features under real-world sensor conditions, offering an interpretable and generalizable approach for intelligent agricultural monitoring.

## Full-text entities

- **Genes:** VIT (vitrin) [NCBI Gene 5212] {aka VIT1}
- **Diseases:** injury to (MESH:D014947), disease (MESH:D004194), Pests (MESH:D029021)
- **Chemicals:** YOLO (-)
- **Species:** Ostrinia furnacalis (Asian corn borer, species) [taxon 93504], Homo sapiens (human, species) [taxon 9606], Hymenoptera (hymenopterans, order) [taxon 7399], Cnaphalocrocis medinalis (rice leaffolder, species) [taxon 437488], Aphidomorpha (aphids, infraorder) [taxon 33380], Coccinellidae (lady beetles, family) [taxon 7080], Brassica oleracea (wild cabbage, species) [taxon 3712], Helicoverpa armigera (American bollworm, species) [taxon 29058], Plutella xylostella (cabbage moth, species) [taxon 51655]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12526750/full.md

---
Source: https://tomesphere.com/paper/PMC12526750