Deliberative multi-agent large language models improve clinical reasoning in ophthalmology
Ehsan Misaghi, Sean T Berkowitz, Bing Yu Chen, Qingyu Chen, Renaud Duval, Pearse A Keane, Danny A Mammo, Ariel Yuhan Ong, Mertcan Sevgi, Sumit Sharma, Sunil K Srivastava, Yih Chung Tham, Fares Antaki

TL;DR
This study demonstrates that multi-agent deliberative councils of large language models significantly improve diagnostic accuracy and safety in ophthalmology clinical reasoning compared to individual models, reducing harm and enhancing reliability.
Contribution
It introduces a multi-agent council framework that leverages structured deliberation among LLMs to improve clinical reasoning and mitigate risks in ophthalmology diagnostics.
Findings
Councils outperform individual models in accuracy across tiers.
Harm rates are significantly reduced with councils.
Councils produce more complete differentials and management plans.
Abstract
Large language models (LLMs) show potential for ophthalmic clinical reasoning, yet individual models risk introducing harm. We evaluated whether multi-agent LLM deliberative councils improve diagnostic performance and mitigate harm compared to individual LLMs. In a comparative cross-sectional study, we assessed 12 individual LLMs and three multi-agent councils on 100 ophthalmology clinical vignettes. Each council comprised four models assembled by type: proprietary flagship, proprietary fast, and open-source. Models independently answered a vignette, anonymously ranked one another's responses, and a designated chair synthesized all responses and peer reviews into a final answer. Councils consistently outperformed pooled individual models across all three tiers. Accuracy improved for proprietary flagship (95.0% vs 90.8%; risk difference [RD]: 4.25 [95% CI: 0.45, 8.05]), proprietary fast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Clinical Reasoning and Diagnostic Skills · Machine Learning in Healthcare
