As easy as APC: overcoming missing data and class imbalance in time series with self-supervised learning
Fiorella Wever, T. Anderson Keller, Laura Symul, Victor Garcia

TL;DR
This paper demonstrates that Autoregressive Predictive Coding (APC), a self-supervised learning method, can effectively address both missing data and class imbalance in time series, improving classification performance especially in challenging real-world datasets.
Contribution
The work introduces a unified self-supervised approach using APC to handle missing data and class imbalance simultaneously without strong assumptions, outperforming existing methods.
Findings
APC significantly improves performance on synthetic data with missingness and imbalance.
APC achieves state-of-the-art AUPRC on Physionet benchmark.
Consistent performance gains observed on real-world medical datasets.
Abstract
High levels of missing data and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. Existing methods approach these problems separately, frequently making significant assumptions about the underlying data generation process in order to lessen the impact of missing information. In this work, we instead demonstrate how a general self-supervised training method, namely Autoregressive Predictive Coding (APC), can be leveraged to overcome both missing data and class imbalance simultaneously without strong assumptions. Specifically, on a synthetic dataset, we show that standard baselines are substantially improved upon through the use of APC, yielding the greatest gains in the combined setting of high missingness and severe class imbalance. We further apply APC on two real-world medical time-series datasets, and show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Imbalanced Data Classification Techniques
MethodsGated Recurrent Unit
