Interpretable Multimodal Zero-Shot ECG Diagnosis via Structured Clinical Knowledge Alignment
Jialu Tang, Hung Manh Pham, Ignace De Lathauwer, Henk S. Schipper, Yuan Lu, Dong Ma, Aaqib Saeed

TL;DR
ZETA is a novel zero-shot multimodal framework that enhances ECG diagnosis interpretability by aligning signals with structured clinical observations, improving transparency and generalization without disease-specific training.
Contribution
The paper introduces ZETA, a zero-shot ECG diagnosis model that uses structured clinical knowledge and multimodal embedding alignment for improved interpretability and generalization.
Findings
Competitive zero-shot classification performance
Enhanced interpretability with clinically relevant features
Grounded predictions in positive and negative diagnostic evidence
Abstract
Electrocardiogram (ECG) interpretation is essential for cardiovascular disease diagnosis, but current automated systems often struggle with transparency and generalization to unseen conditions. To address this, we introduce ZETA, a zero-shot multimodal framework designed for interpretable ECG diagnosis aligned with clinical workflows. ZETA uniquely compares ECG signals against structured positive and negative clinical observations, which are curated through an LLM-assisted, expert-validated process, thereby mimicking differential diagnosis. Our approach leverages a pre-trained multimodal model to align ECG and text embeddings without disease-specific fine-tuning. Empirical evaluations demonstrate ZETA's competitive zero-shot classification performance and, importantly, provide qualitative and quantitative evidence of enhanced interpretability, grounding predictions in specific,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
