MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition
Tim Cofala, Christian Kalfar, Jingge Xiao, Johanna Schrader, Michelle Tang, Wolfgang Nejdl

TL;DR
This paper evaluates TxAgent, an agentic AI system for therapeutic decision-making, demonstrating how retrieval-augmented reasoning and tool integration improve safety and accuracy in clinical AI applications, as showcased in the NeurIPS CURE-Bench Challenge.
Contribution
It introduces a novel evaluation protocol for medical AI reasoning and tool usage, highlighting the impact of retrieval quality on therapeutic decision-making performance.
Findings
Retrieval quality significantly affects model accuracy.
Improved tool-retrieval strategies enhance reasoning performance.
TxAgent achieved the NeurIPS CURE-Bench Excellence Award.
Abstract
Therapeutic decision-making in clinical medicine constitutes a high-stakes domain in which AI guidance interacts with complex interactions among patient characteristics, disease processes, and pharmacological agents. Tasks such as drug recommendation, treatment planning, and adverse-effect prediction demand robust, multi-step reasoning grounded in reliable biomedical knowledge. Agentic AI methods, exemplified by TxAgent, address these challenges through iterative retrieval-augmented generation (RAG). TxAgent employs a fine-tuned Llama-3.1-8B model that dynamically generates and executes function calls to a unified biomedical tool suite (ToolUniverse), integrating FDA Drug API, OpenTargets, and Monarch resources to ensure access to current therapeutic information. In contrast to general-purpose RAG systems, medical applications impose stringent safety constraints, rendering the accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Biomedical Text Mining and Ontologies · Topic Modeling
