Pest-Thinker: Learning to Think and Reason like Entomologists via Reinforcement Learning

Xueheng Li; Yu Wang; Tao Hu; Ji Huang; Ke Cao; Qize Yang; Rui Li; Jie Zhang; Chengjun Xie

arXiv:2605.06121·cs.CV·May 8, 2026

Pest-Thinker: Learning to Think and Reason like Entomologists via Reinforcement Learning

Xueheng Li, Yu Wang, Tao Hu, Ji Huang, Ke Cao, Qize Yang, Rui Li, Jie Zhang, Chengjun Xie

PDF

TL;DR

Pest-Thinker is a reinforcement learning framework that enhances multimodal language models to reason about pest morphology for better crop pest identification, using new datasets and structured reasoning techniques.

Contribution

It introduces Pest-Thinker, combining knowledge-driven RL, new pest datasets, and Chain-of-Thought reasoning to improve pest recognition and morphological understanding.

Findings

01

Significant improvement in pest morphological reasoning accuracy.

02

Effective generalization to out-of-domain pest species.

03

Enhanced visual understanding through structured reasoning.

Abstract

Pest-induced crop losses pose a major threat to global food security and sustainable agricultural development. While recent advances in Multimodal Large Language Models (MLLMs) have shown strong potential for visual understanding and smart agriculture, their direct application to pest recognition remains limited due to the domain's unique challenges such as high inter-species complexity, intra-species variability, and the scarcity of expert-annotated data. In this work, we introduce Pest-Thinker, a knowledge-driven reinforcement learning (RL) framework that enables MLLMs to reason over fine-grained pest morphology. We first construct two high-definition pest benchmarks, QFSD and AgriInsect, comprising diverse species and expert-annotated morphological traits. Leveraging these datasets, we synthesize Chain-of-Thought (CoT) reasoning trajectories to facilitate structured learning of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.