SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science

Jie Ying; Zihong Chen; Zhefan Wang; Wanli Jiang; Chenyang Wang; Zhonghang Yuan; Haoyang Su; Huanjun Kong; Fan Yang; Nanqing Dong

arXiv:2505.13220·cs.CL·May 20, 2025

SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science

Jie Ying, Zihong Chen, Zhefan Wang, Wanli Jiang, Chenyang Wang, Zhonghang Yuan, Haoyang Su, Huanjun Kong, Fan Yang, Nanqing Dong

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

SeedBench is a pioneering multi-task benchmark tailored for evaluating large language models in seed science, aiming to bridge the gap between AI capabilities and complex agricultural research needs.

Contribution

This paper introduces SeedBench, the first specialized benchmark for seed science, developed with experts, to evaluate LLMs in seed breeding and related tasks.

Findings

01

Significant performance gaps between LLMs and seed science tasks

02

Evaluation of 26 diverse LLMs across multiple seed-related benchmarks

03

Foundation laid for future research on LLMs in seed design

Abstract

Seed science is essential for modern agriculture, directly influencing crop yields and global food security. However, challenges such as interdisciplinary complexity and high costs with limited returns hinder progress, leading to a shortage of experts and insufficient technological support. While large language models (LLMs) have shown promise across various fields, their application in seed science remains limited due to the scarcity of digital resources, complex gene-trait relationships, and the lack of standardized benchmarks. To address this gap, we introduce SeedBench -- the first multi-task benchmark specifically designed for seed science. Developed in collaboration with domain experts, SeedBench focuses on seed breeding and simulates key aspects of modern breeding processes. We conduct a comprehensive evaluation of 26 leading LLMs, encompassing proprietary, open-source, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-sciencelab/SeedBench
noneOfficial

Datasets

Videos

SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science· underline

Taxonomy

TopicsSmart Agriculture and AI · ICT in Developing Communities · Topic Modeling