A Practical Guide Towards Interpreting Time-Series Deep Clinical Predictive Models: A Reproducibility Study

Yongda Fan; John Wu; Andrea Fitzpatrick; Naveen Baskaran; Jimeng Sun; Adam Cross

arXiv:2603.24828·cs.LG·March 27, 2026

A Practical Guide Towards Interpreting Time-Series Deep Clinical Predictive Models: A Reproducibility Study

Yongda Fan, John Wu, Andrea Fitzpatrick, Naveen Baskaran, Jimeng Sun, Adam Cross

PDF

Open Access

TL;DR

This study systematically benchmarks interpretability methods for deep clinical time-series models, highlighting the effectiveness of attention mechanisms, limitations of black-box explainers, and the need for more reliable interpretability approaches.

Contribution

It provides a comprehensive, reproducible benchmark evaluating interpretability methods across clinical tasks and model architectures, offering guidelines for improving interpretability in clinical models.

Findings

01

Attention mechanisms are effective for interpreting model predictions.

02

Black-box interpretability methods like KernelSHAP and LIME are computationally infeasible for time-series data.

03

Many interpretability approaches are unreliable and not trustworthy.

Abstract

Clinical decisions are high-stakes and require explicit justification, making model interpretability essential for auditing deep clinical models prior to deployment. As the ecosystem of model architectures and explainability methods expands, critical questions remain: Do architectural features like attention improve explainability? Do interpretability approaches generalize across clinical tasks? While prior benchmarking efforts exist, they often lack extensibility and reproducibility, and critically, fail to systematically examine how interpretability varies across the interplay of clinical tasks and model architectures. To address these gaps, we present a comprehensive benchmark evaluating interpretability methods across diverse clinical prediction tasks and model architectures. Our analysis reveals that: (1) attention when leveraged properly is a highly efficient approach for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education