ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation
Jungwoo Oh, Hyunseung Chung, Junhee Lee, Min-Gyu Kim, Hangyul Yoon, Ki Seong Lee, Youngchae Lee, Muhan Yeo, Edward Choi

TL;DR
This paper introduces ECG-Reasoning-Benchmark, a comprehensive evaluation framework revealing that current multimodal models lack genuine step-by-step reasoning in ECG interpretation, often relying on superficial cues instead of true visual understanding.
Contribution
The paper presents a new benchmark with over 6,400 samples to systematically assess reasoning in ECG diagnosis, exposing significant gaps in current models' logical deduction capabilities.
Findings
Models have high medical knowledge but poor reasoning chain completion (6%).
Models fail to ground ECG findings to visual evidence.
Current training paradigms do not foster true visual reasoning in ECG interpretation.
Abstract
While Multimodal Large Language Models (MLLMs) show promising performance in automated electrocardiogram interpretation, it remains unclear whether they genuinely perform actual step-by-step reasoning or just rely on superficial visual cues. To investigate this, we introduce \textbf{ECG-Reasoning-Benchmark}, a novel multi-turn evaluation framework comprising over 6,400 samples to systematically assess step-by-step reasoning across 17 core ECG diagnoses. Our comprehensive evaluation of state-of-the-art models reveals a critical failure in executing multi-step logical deduction. Although models possess the medical knowledge to retrieve clinical criteria for a diagnosis, they exhibit near-zero success rates (6% Completion) in maintaining a complete reasoning chain, primarily failing to ground the corresponding ECG findings to the actual visual evidence in the ECG signal. These results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsECG Monitoring and Analysis · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
