TARSE: Test-Time Adaptation via Retrieval of Skills and Experience for Reasoning Agents
Junda Wang, Zonghai Tao, Hansi Zeng, Zhichao Yang, Hamed Zamani, Hong Yu

TL;DR
This paper introduces TARSE, a method that improves clinical question answering by retrieving relevant skills and experiences and performing test-time adaptation to enhance reasoning accuracy.
Contribution
It presents a novel framework that explicitly retrieves and aligns clinical skills and experiences at test time for better reasoning in medical agents.
Findings
Consistent performance improvements over baseline methods.
Effective retrieval of relevant skills and experiences.
Enhanced reasoning accuracy in medical question answering.
Abstract
Complex clinical decision making often fails not because a model lacks facts, but because it cannot reliably select and apply the right procedural knowledge and the right prior example at the right reasoning step. We frame clinical question answering as an agent problem with two explicit, retrievable resources: skills, reusable clinical procedures such as guidelines, protocols, and pharmacologic mechanisms; and experience, verified reasoning trajectories from previously solved cases (e.g., chain-of-thought solutions and their step-level decompositions). At test time, the agent retrieves both relevant skills and experiences from curated libraries and performs lightweight test-time adaptation to align the language model's intermediate reasoning with clinically valid logic. Concretely, we build (i) a skills library from guideline-style documents organized as executable decision rules, (ii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare
