Interpretable Pre-Trained Transformers for Heart Time-Series Data
Harry J. Davies, James Monsen, Danilo P. Mandic

TL;DR
This paper introduces interpretable pre-trained transformer models for analyzing heart time-series data, demonstrating their effectiveness in classification and beat detection tasks with high accuracy and explainability.
Contribution
The work develops two pre-trained transformer models for cardiac data that are fully interpretable and easily fine-tuned for clinical tasks, enhancing transparency and performance.
Findings
Models focus on physiologically relevant features like P-wave and dicrotic notch.
Fine-tuning achieves high AUCs of 0.99 and 0.93 for AF detection.
State-of-the-art F1 score of 98% for beat detection.
Abstract
Decoder-only transformers are the backbone of the popular generative pre-trained transformer (GPT) series of large language models. In this work, we employ this framework to the analysis of clinical heart time-series data, to create two pre-trained general purpose cardiac models, termed PPG-PT and ECG-PT. We place a special emphasis on making both such pre-trained models fully interpretable. This is achieved firstly through aggregate attention maps which show that, in order to make predictions, the model focuses on similar points in previous cardiac cycles and gradually broadens its attention in deeper layers. Next, we show that tokens with the same value, which occur at different distinct points in the electrocardiography (ECG) and photoplethysmography (PPG) cycle, form separate clusters in high dimensional space. The clusters form according to phase, as the tokens propagate through…
Peer Reviews
Decision·Submitted to ICLR 2025
- Interpretability analysis on physiological time series - The model was tested on two types of physiological time series: PPG and single-lead ECG.
1. The model's novelty is questionable, as the PPG-GPT has previously been explored across multiple tasks (Chen et al., 2024). It will be helpful to specify how the current work differs from it. 2. The manuscript does not compare models on different PPG or ECG tasks compared to previous PPG-GPT (Chen et al., 2024). The comparison with one baseline per PPG and ECG is mentioned only in the appendix. Importantly, the models are compared on performance using different setups. 3. The experiments have
1. To the best of my knowledge, this is the first work that applies generative pre-training to representation learning for heart time-series data, where previously contrastive learning methods were the most popular (also inherited from other domains like voice recognition). This is a novel and interesting idea. 2. The attention mechanism analysis is very detailed and provides a clear understanding of the learned representations, and how the transformer layers work on the heart time-series data.
1. The datasets used in the experiments are not comprehensive enough. The authors used the CinC2020 dataset, which is superseded by the CinC2021 dataset. The latter dataset is more comprehensive and contains more data. Moreover, there are other larger datasets available, such as the CODE-15% dataset (https://zenodo.org/records/4916206), etc. 2. The authors did not compare their method in their numerical experiments with other representation learning methods, such as contrastive learning-based m
- Related GPT based interpretation studies for ECG and PPG exists in literature, but the experimentation carried out in this study is rigour and systematic in revealing the explanations for firstly, the generation of next token and finally, extend the idea for a downstream classification task. - The idea of decoding attention head for explanations are common in the CV and NLP domain, but extending it to physiological time-series can be seen a contribution. - Data split follows a subject-wise s
Major: - The explainability method of GPT models was shown to focus on previous cycles meaning that it can observe a beat w.r.t the previous one which is why the downstream task interprets well where distance between two consecutive beat is important criteria. However, this might not be the case for other common tasks such as sleep staging where the input in 30 second ECG or PPG signal and separating sleep stages such as wake, light sleep, deep sleep and REM sleep manifests from HR variability.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Machine Learning in Healthcare
MethodsSoftmax · Attention Is All You Need
