Evaluating Multi-Agent LLM Architectures for Rare Disease Diagnosis

Ahmed Almasoud

arXiv:2603.06856·cs.MA·March 10, 2026

Evaluating Multi-Agent LLM Architectures for Rare Disease Diagnosis

Ahmed Almasoud

PDF

Open Access

TL;DR

This study evaluates four multi-agent LLM architectures for rare disease diagnosis, introducing a Reasoning Gap metric, and finds that hierarchical topology slightly outperforms others, while complexity does not always improve accuracy.

Contribution

It systematically compares multi-agent topologies for rare disease diagnosis and introduces a new metric to assess reasoning quality, highlighting the impact of architecture design.

Findings

01

Hierarchical topology achieves 50.0% accuracy.

02

Adversarial model significantly reduces accuracy to 27.3%.

03

Multi-agent systems outperform single-agent in Bone and Thoracic diseases.

Abstract

While large language models are capable diagnostic tools, the impact of multi-agent topology on diagnostic accuracy remains underexplored. This study evaluates four agent topologies, Control (single agent), Hierarchical, Adversarial, and Collaborative, across 302 cases spanning 33 rare disease categories. We introduce a Reasoning Gap metric to quantify the difference between internal knowledge retrieval and final diagnostic accuracy. Results indicate that the Hierarchical topology (50.0% accuracy) marginally outperforms Collaborative (49.8%) and Control (48.5%) configurations. In contrast, the Adversarial model significantly degrades performance (27.3%), exhibiting a massive Reasoning Gap where valid diagnoses were rejected due to artificial doubt. Across all architectures, performance was strongest in Allergic diseases and Toxic Effects categories but poorest in Cardiac Malformation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Rare Diseases · Explainable Artificial Intelligence (XAI) · Topic Modeling