TL;DR
This paper demonstrates that combining human physicians with large language models in a hybrid collective significantly improves diagnostic accuracy in complex medical cases compared to individual or collective human or AI diagnoses alone.
Contribution
The study introduces a hybrid human-AI collective approach that leverages the complementary strengths of physicians and LLMs to enhance diagnostic accuracy in medicine.
Findings
Hybrid collectives outperform individual physicians and LLMs in accuracy.
The approach is effective across multiple medical specialties.
Combining human and AI diagnoses reduces different types of errors.
Abstract
Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
