AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models

Lian Yan; Haotian Wang; Chen Tang; Haifeng Liu; Tianyang Sun; Liangliang Liu; Yi Guan; Jingchi Jiang

arXiv:2507.21773·cs.CL·July 30, 2025

AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models

Lian Yan, Haotian Wang, Chen Tang, Haifeng Liu, Tianyang Sun, Liangliang Liu, Yi Guan, Jingchi Jiang

PDF

TL;DR

AgriEval is a comprehensive Chinese agricultural benchmark with diverse question formats, covering multiple categories and cognitive scenarios, designed to evaluate and improve large language models in agricultural applications.

Contribution

This paper introduces AgriEval, the first extensive Chinese agricultural benchmark with high-quality data, diverse formats, and broad coverage for evaluating LLMs in agriculture.

Findings

01

Most LLMs struggle to surpass 60% accuracy on AgriEval.

02

Performance varies significantly across different models and categories.

03

Strategies for enhancing LLM performance in agriculture are proposed.

Abstract

In the agricultural domain, the deployment of large language models (LLMs) is hindered by the lack of training data and evaluation benchmarks. To mitigate this issue, we propose AgriEval, the first comprehensive Chinese agricultural benchmark with three main characteristics: (1) Comprehensive Capability Evaluation. AgriEval covers six major agriculture categories and 29 subcategories within agriculture, addressing four core cognitive scenarios: memorization, understanding, inference, and generation. (2) High-Quality Data. The dataset is curated from university-level examinations and assignments, providing a natural and robust benchmark for assessing the capacity of LLMs to apply knowledge and make expert-like decisions. (3) Diverse Formats and Extensive Scale. AgriEval comprises 14,697 multiple-choice questions and 2,167 open-ended question-and-answer questions, establishing it as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.