TxGemma: Efficient and Agentic LLMs for Therapeutics
Eric Wang, Samuel Schmidgall, Paul F. Jaeger, Fan Zhang, Rory Pilgrim,, Yossi Matias, Joelle Barral, David Fleet, Shekoofeh Azizi

TL;DR
TxGemma introduces a suite of efficient, generalist large language models tailored for therapeutic development, capable of prediction, reasoning, and interaction, outperforming specialized models and enabling data-efficient fine-tuning.
Contribution
The paper presents TxGemma, a novel set of large language models designed for broad therapeutic applications, including interactive reasoning and agentic capabilities, with superior performance and data efficiency.
Findings
TxGemma outperforms state-of-the-art generalist models on 64 of 66 tasks.
Fine-tuning TxGemma requires less data than base LLMs for therapeutic tasks.
Agentic-Tx surpasses previous models on multiple scientific benchmarks.
Abstract
Therapeutic development is a costly and high-risk endeavor that is often plagued by high failure rates. To address this, we introduce TxGemma, a suite of efficient, generalist large language models (LLMs) capable of therapeutic property prediction as well as interactive reasoning and explainability. Unlike task-specific models, TxGemma synthesizes information from diverse sources, enabling broad application across the therapeutic development pipeline. The suite includes 2B, 9B, and 27B parameter models, fine-tuned from Gemma-2 on a comprehensive dataset of small molecules, proteins, nucleic acids, diseases, and cell lines. Across 66 therapeutic development tasks, TxGemma achieved superior or comparable performance to the state-of-the-art generalist model on 64 (superior on 45), and against state-of-the-art specialist models on 50 (superior on 26). Fine-tuning TxGemma models on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
MethodsBalanced Selection
