Process-Supervised Multi-Agent Reinforcement Learning for Reliable Clinical Reasoning

Chaeeun Lee; T. Michael Yates; Pasquale Minervini; T. Ian Simpson

arXiv:2602.14160·cs.AI·February 17, 2026

Process-Supervised Multi-Agent Reinforcement Learning for Reliable Clinical Reasoning

Chaeeun Lee, T. Michael Yates, Pasquale Minervini, T. Ian Simpson

PDF

Open Access

TL;DR

This paper presents a multi-agent reinforcement learning framework for clinical reasoning that emphasizes process-grounded decision-making, improving both outcome accuracy and process fidelity in gene-disease validity curation.

Contribution

It introduces a process-supervised multi-agent RL approach with hierarchical coordination, enhancing clinical reasoning alignment and accuracy over outcome-only methods.

Findings

01

Outcome accuracy improved from 0.195 to 0.732 with outcome-only rewards.

02

Process fidelity increased from 0.392 to 0.520 with combined process and outcome rewards.

03

Hierarchical multi-agent system effectively balances reasoning quality and decision accuracy.

Abstract

Clinical decision-making requires nuanced reasoning over heterogeneous evidence and traceable justifications. While recent LLM multi-agent systems (MAS) show promise, they largely optimise for outcome accuracy while overlooking process-grounded reasoning aligned with clinical standards. One critical real-world case of this is gene-disease validity curation, where experts must determine whether a gene is causally implicated in a disease by synthesising diverse biomedical evidence. We introduce an agent-as-tool reinforcement learning framework for this task with two objectives: (i) process-level supervision to ensure reasoning follows valid clinical pathways, and (ii) efficient coordination via a hierarchical multi-agent system. Our evaluation on the ClinGen dataset shows that with outcome-only rewards, MAS with a GRPO-trained Qwen3-4B supervisor agent substantially improves final outcome…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Rare Diseases · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare