PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

Sajib Acharjee Dip; Song Li; Liqing Zhang

arXiv:2605.10032·cs.CL·May 13, 2026

PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

Sajib Acharjee Dip, Song Li, Liqing Zhang

PDF

TL;DR

PlantMarkerBench is a comprehensive benchmark for evaluating the accuracy of literature-grounded evidence interpretation for plant marker genes across multiple species, highlighting challenges in evidence classification and model reliability.

Contribution

It introduces a novel multi-species benchmark with annotated evidence instances and defines tasks for evidence validity and type classification, facilitating future research in trustworthy biological information extraction.

Findings

01

Frontier models perform well on direct expression evidence but struggle with functional and indirect evidence.

02

Evidence-type classification remains a significant challenge, with confusion among categories.

03

Open-weight models tend to have higher false positives in ambiguous biological contexts.

Abstract

Cell-type-specific marker genes are fundamental to plant biology, yet existing resources primarily rely on curated databases or high-throughput studies without explicitly modeling the supporting evidence found in scientific literature. We introduce PlantMarkerBench, a multi-species benchmark for evaluating literature-grounded plant marker evidence interpretation from full-text biological papers. PlantMarkerBench is constructed using a modular curation pipeline integrating large-scale literature retrieval, hybrid search, species-aware biological grounding, structured evidence extraction, and targeted human review. The benchmark spans four plant species -- Arabidopsis, maize, rice, and tomato -- and contains 5,550 sentence-level evidence instances annotated for marker-evidence validity, evidence type, and support strength. We define two benchmark tasks: determining whether a candidate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.