FlyAOC: Evaluating Agentic Ontology Curation of Drosophila Scientific Knowledge Bases
Xingjian Zhang, Sophia Moylan, Ziyang Xiong, Qiaozhu Mei, Yichen Luo, Jiaqi W. Ma

TL;DR
FlyAOC introduces FlyBench, a comprehensive benchmark for evaluating AI agents in the complex task of curating Drosophila scientific knowledge, emphasizing end-to-end reasoning, retrieval, and annotation from large literature corpora.
Contribution
This paper presents FlyBench, the first benchmark for end-to-end agentic ontology curation from scientific literature, and evaluates multiple agent architectures on this challenging task.
Findings
Multi-agent architectures outperform simpler models.
Scaling backbone models yields diminishing returns.
Agents mainly use retrieval for confirmation, not discovery.
Abstract
Scientific knowledge bases accelerate discovery by curating findings from primary literature into structured, queryable formats for both human researchers and emerging AI systems. Maintaining these resources requires expert curators to search relevant papers, reconcile evidence across documents, and produce ontology-grounded annotations - a workflow that existing benchmarks, focused on isolated subtasks like named entity recognition or relation extraction, do not capture. We present FlyBench to evaluate AI agents on end-to-end agentic ontology curation from scientific literature. Given only a gene symbol, agents must search and read from a corpus of 16,898 full-text papers to produce structured annotations: Gene Ontology terms describing function, expression patterns, and historical synonyms linking decades of nomenclature. The benchmark includes 7,397 expert-curated annotations across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Semantic Web and Ontologies
