BABE: Biology Arena BEnchmark
Junting Zhou, Jin Chen, Linfeng Hao, Denghui Cao, Zheyu Wang, Qiguang Chen, Chaoyou Fu, Jiaze Chen, Yuchen Wu, Ge Zhang, Mingxuan Wang, Wenhao Huang, Tong Yang

TL;DR
BABE is a new benchmark designed to evaluate biological AI systems' ability to perform experimental reasoning, integrating complex scientific knowledge and reasoning skills from real research data.
Contribution
It introduces a comprehensive, real-world-based benchmark for assessing experimental reasoning in biological AI, addressing a gap in existing evaluation methods.
Findings
BABE effectively measures causal reasoning and cross-scale inference.
Models show varying proficiency, highlighting areas for improvement.
The benchmark promotes development of more scientifically capable AI systems.
Abstract
The rapid evolution of large language models (LLMs) has expanded their capabilities from basic dialogue to advanced scientific reasoning. However, existing benchmarks in biology often fail to assess a critical skill required of researchers: the ability to integrate experimental results with contextual knowledge to derive meaningful conclusions. To address this gap, we introduce BABE(Biology Arena BEnchmark), a comprehensive benchmark designed to evaluate the experimental reasoning capabilities of biological AI systems. BABE is uniquely constructed from peer-reviewed research papers and real-world biological studies, ensuring that tasks reflect the complexity and interdisciplinary nature of actual scientific inquiry. BABE challenges models to perform causal reasoning and cross-scale inference. Our benchmark provides a robust framework for assessing how well AI systems can reason like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Explainable Artificial Intelligence (XAI) · Machine Learning in Bioinformatics
