Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery
Chaoqun Yang, Xinyu Lin, Shulin Li, Wenjie Wang, Ruihan Guo, Fuli Feng, Tat-Seng Chua

TL;DR
This paper introduces DBench-Bio, a dynamic, automated benchmark for evaluating AI's ability to discover new biological knowledge, addressing limitations of static datasets and outdated evaluation methods.
Contribution
We propose a novel, fully automated, monthly-updated benchmark that assesses AI's capacity for biological knowledge discovery using a three-stage pipeline.
Findings
Current SOTA models show limitations in discovering new knowledge.
DBench-Bio provides a living resource for ongoing evaluation.
The framework can be adapted to other scientific domains.
Abstract
Recent advancements in Large Language Model (LLM) agents have demonstrated remarkable potential in automatic knowledge discovery. However, rigorously evaluating an AI's capacity for knowledge discovery remains a critical challenge. Existing benchmarks predominantly rely on static datasets, leading to inevitable data contamination where models have likely seen the evaluation knowledge during training. Furthermore, the rapid release cycles of modern LLMs render static benchmarks quickly outdated, failing to assess the ability to discover truly new knowledge. To address these limitations, we propose DBench-Bio, a dynamic and fully automated benchmark designed to evaluate AI's biological knowledge discovery ability. DBench-Bio employs a three-stage pipeline: (1) data acquisition of rigorous, authoritative paper abstracts; (2) QA extraction utilizing LLMs to synthesize scientific hypothesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Materials Science · Biomedical Text Mining and Ontologies
