AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture

Zi Ye; Yibin Wen; Xiaoya Fan; Xinyu Zhang; Jing Wu; Kun Zeng; Zurong Mai; Jiarui Zhang; Bohan Shi; Juepeng Zheng; Jianxi Huang; Yutong Lu; Haohuan Fu

arXiv:2605.22366·cs.CV·May 22, 2026

AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture

Zi Ye, Yibin Wen, Xiaoya Fan, Xinyu Zhang, Jing Wu, Kun Zeng, Zurong Mai, Jiarui Zhang, Bohan Shi, Juepeng Zheng, Jianxi Huang, Yutong Lu, Haohuan Fu

PDF

1 Repo

TL;DR

AgroTools is a comprehensive benchmark designed to evaluate multimodal agents in agriculture, focusing on tool use, process accuracy, and task success, highlighting current model limitations.

Contribution

Introduces AgroTools, a new benchmark with structured annotations and diverse tasks for assessing tool-augmented multimodal agricultural agents.

Findings

01

Current models are unreliable in agricultural tool-use tasks.

02

Bottlenecks identified in tool planning and execution recovery.

03

Benchmark and evaluation code available at Hugging Face.

Abstract

Agricultural decision-making increasingly requires multimodal systems that can transform visual observations into reliable, executable actions. However, existing agricultural multimodal benchmarks mainly evaluate final-answer correctness and provide limited support for assessing whether models can use external tools to complete precision-sensitive workflows. In this paper, we introduce AgroTools, a benchmark for evaluating tool-augmented multimodal agents in agriculture. AgroTools contains 539 question-answer instances paired with 1,097 heterogeneous agricultural images, spanning five task families and an executable environment of 14 agricultural tools. Each query is annotated with structured tool-use traces, enabling a dual-view evaluation of both process-level execution quality and outcome-level task success. We benchmark 9 open-source and 4 closed-source multimodal large language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/AgroTools/AgroTools
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.