ROI-Aware Multiscale Cross-Attention Vision Transformer for Pest Image Identification
Ga-Eun Kim, Chang-Hwan Son

TL;DR
This paper introduces ROI-ViT, a novel vision transformer that effectively detects and focuses on pest regions in images with complex backgrounds and small pests, significantly improving identification accuracy.
Contribution
The paper proposes a dual-branch ROI-aware multiscale cross-attention vision transformer with ROI generation and fusion, enhancing pest detection robustness and accuracy over existing models.
Findings
Achieves state-of-the-art accuracy on multiple pest datasets.
Demonstrates robustness to complex backgrounds and small pest sizes.
Outperforms existing models like MViT, PVT, DeiT, Swin-ViT, and EfficientNet.
Abstract
The pests captured with imaging devices may be relatively small in size compared to the entire images, and complex backgrounds have colors and textures similar to those of the pests, which hinders accurate feature extraction and makes pest identification challenging. The key to pest identification is to create a model capable of detecting regions of interest (ROIs) and transforming them into better ones for attention and discriminative learning. To address these problems, we will study how to generate and update the ROIs via multiscale cross-attention fusion as well as how to be highly robust to complex backgrounds and scale problems. Therefore, we propose a novel ROI-aware multiscale cross-attention vision transformer (ROI-ViT). The proposed ROI-ViT is designed using dual branches, called Pest and ROI branches, which take different types of maps as input: Pest images and ROI maps. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Date Palm Research Studies · Mosquito-borne diseases and control
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Multi-Head Attention · Attention Is All You Need · Depthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Sigmoid Activation · 1x1 Convolution · Dropout · Batch Normalization
