Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery
Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre, Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

TL;DR
This paper introduces Icentia11K, the largest public ECG dataset with 11,000 patients, aimed at advancing unsupervised representation learning for arrhythmia subtype discovery and anomalous signal detection.
Contribution
It provides a large-scale ECG dataset and proposes an unsupervised learning framework to discover new arrhythmia subtypes and improve semi-supervised ECG analysis.
Findings
Clustering of known arrhythmia subtypes in PCA embeddings.
Baseline feature extractors demonstrate potential for subtype discovery.
Dataset enables semi-supervised ECG modeling.
Abstract
We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-supervised fashion. We provide a set of baselines for different feature extractors that can be built upon. Additionally, we perform qualitative evaluations on results from PCA embeddings, where we identify some clustering of known subtypes indicating the potential for representation learning in arrhythmia sub-type discovery.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsECG Monitoring and Analysis · Cardiac electrophysiology and arrhythmias · Phonocardiography and Auscultation Techniques
MethodsPrincipal Components Analysis
