OwkinZero: Accelerating Biological Discovery with AI

Nathan Bigaud; Vincent Cabeli; Meltem G\"urel; Arthur Pignet; John Klein; Gilles Wainrib; Eric Durand

arXiv:2508.16315·cs.LG·August 26, 2025

OwkinZero: Accelerating Biological Discovery with AI

Nathan Bigaud, Vincent Cabeli, Meltem G\"urel, Arthur Pignet, John Klein, Gilles Wainrib, Eric Durand

PDF

TL;DR

OwkinZero introduces specialized AI models trained on curated biological datasets that outperform larger models in biological reasoning tasks, advancing AI's role in biomedical discovery.

Contribution

The paper presents a new benchmark dataset collection and a reinforcement learning approach to develop specialized LLMs that excel in biological reasoning tasks.

Findings

01

OwkinZero models outperform larger commercial LLMs on biological benchmarks.

02

Specialist models trained on single tasks outperform base models on unseen tasks.

03

Training on diverse datasets enhances cross-task generalization.

Abstract

While large language models (LLMs) are rapidly advancing scientific research, they continue to struggle with core biological reasoning tasks essential for translational and biomedical discovery. To address this limitation, we created and curated eight comprehensive benchmark datasets comprising over 300,000 verifiable question-and-answer pairs, each targeting critical challenges in drug discovery including target druggability, modality suitability, and drug perturbation effects. Using this resource, we developed the OwkinZero models by post-training open-source LLMs through a Reinforcement Learning from Verifiable Rewards strategy. Our results demonstrate that specialized 8-32B OwkinZero models substantially outperform larger, state-of-the-art commercial LLMs on these biological benchmarks. Remarkably, we uncover evidence of a key aspect of generalization: specialist models trained on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.