Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics
Yuan Zhou, Peng Zhang, Mengya Song, Alice Zheng, Yiwen Lu, Zhiheng, Liu, Yong Chen, Zhaohan Xi

TL;DR
ZODIAC is a multi-agent LLM framework designed to achieve cardiologist-level professionalism in diagnostics, outperforming existing models and integrating into ECG devices for real-world medical applications.
Contribution
Introduces ZODIAC, a multi-agent LLM system with cardiologist-level expertise, built with real patient data and validated clinically for healthcare diagnostics.
Findings
ZODIAC outperforms GPT-4o, Llama-3.1, and Gemini-pro in clinical effectiveness.
ZODIAC achieves high scores across eight clinical metrics.
Successfully integrated into ECG devices for practical use.
Abstract
Large language models (LLMs) have demonstrated remarkable progress in healthcare. However, a significant gap remains regarding LLMs' professionalism in domain-specific clinical practices, limiting their application in real-world diagnostics. In this work, we introduce ZODIAC, an LLM-powered framework with cardiologist-level professionalism designed to engage LLMs in cardiological diagnostics. ZODIAC assists cardiologists by extracting clinically relevant characteristics from patient data, detecting significant arrhythmias, and generating preliminary reports for the review and refinement by cardiologists. To achieve cardiologist-level professionalism, ZODIAC is built on a multi-agent collaboration framework, enabling the processing of patient data across multiple modalities. Each LLM agent is fine-tuned using real-world patient data adjudicated by cardiologists, reinforcing the model's…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The method and problem are well-defined, and the paper is well-written. - The proposed approach is built on cardiologist-adjudicated text inputs and is evaluated by clinicians, which enhances its clinical rigor. - The method incorporates ECG data along with relevant metadata and integrates current clinical guidelines to generate comprehensive reports.
While the proposed method, ZODIAC, addresses a well-defined clinical problem of generating ECG reports and integrates valuable clinical insights through multi-agent collaboration, concerns remain regarding the rigor of its evaluation. 1. Evaluation Metric: The metrics defined in Table 1 are well-defined for qualitative evaluation, but appear subject to evaluator variability, raising concerns about the objectivity and rigor of its scale in Table 2. The following points are regarding the rigor of
- The studied problem is important and practical in healthcare applications. - The paper includes human validation and discussion on real-world deployment, which provide insights on how recent advances of LLMs can be utilized in clinical settings.
- While ZODIAC effectively uses a multi-agent system, the framework is somewhat derivative of existing LLM-based multi-agent approaches. A more comprehensive comparison of this framework against existing multi-agent approaches in healthcare [1,2] is needed to demonstrate the novelty and effectiveness of the proposed framework. - The experiments are not convinced. * The paper relies on a small validation set (only 5% of the dataset, ~100 patients). It raises concerns about the robustness of
- Well written and nicely presented - Real-world validation of the approach
Text: - typo in figure 7b signle - line 375: remove "is": Instead of using public benchmarks, we adopt real patient data is to align with practical diagnostics General: - *Representative Groups*: race is not indicated as statistic. - Human validation: how many people have provided a report? Was the data that Zodiac was pre-trained on from the same institution as the physicians? This might explain why clinical-specialist LLM's performed poorly (overfitted to a certain dataset). - Line 375: *Data
1. This paper is clear and well-written. 2. Using LLMs for clinical decision support is a very important topic, which can help reduce clinician burnout and streamline healthcare administration. 3. The design of the ZODIAC framework is intuitive, extensible and flexible, with a correction mechanism with factchecking under clinical guidelines to ensure the safety of the generations. 4. The ablation study is quite comprehensive and justifies the design choice of each component.
1. In section 4.1, it is mentioned that data from 2000+ patients from collaborating healthcare institutions are used, and 5% of the data is used for evaluation. Given that the evaluation set is in the same distribution with the data used for fine-tuning the LLM agents, I am wondering whether the framework could generalize to a broader patient population, e.g. patients in other hospitals/medical centers, or patients from very different geographic locations or demographics (e.g. different race or
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Business Process Modeling and Analysis
