BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for   Biomedical Science

Xinna Lin; Siqi Ma; Junjie Shan; Xiaojing Zhang; Shell Xu Hu; Tiannan; Guo; Stan Z. Li; Kaicheng Yu

arXiv:2407.00466·cs.CL·July 2, 2024·1 cites

BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu, Tiannan, Guo, Stan Z. Li, Kaicheng Yu

PDF

Open Access 1 Repo

TL;DR

BioKGBench introduces a novel benchmark for evaluating AI agents in biomedical science by assessing their ability to understand scientific literature and verify facts using knowledge graphs, revealing significant gaps in current agent performance.

Contribution

The paper proposes a new benchmark, BioKGBench, that evaluates biomedical AI agents on scientific claim verification and knowledge graph question-answering, addressing limitations of existing QA-based assessments.

Findings

01

State-of-the-art agents perform poorly on the benchmark.

02

Over 90 factual errors found in a popular knowledge graph.

03

The simple BKGAgent baseline shows promising results.

Abstract

Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an AI Scientist perspective remains largely unexplored. To this end, we draw inspiration from one most important abilities of scientists, understanding the literature, and introduce BioKGBench. In contrast to traditional evaluation benchmark that only focuses on factual QA, where the LLMs are known to have hallucination issues, we first disentangle "Understanding Literature" into two atomic abilities, i) "Understanding" the unstructured text from research papers by performing scientific claim…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

westlake-autolab/biokgbench.github.io
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Artificial Intelligence in Healthcare and Education