Preventing Data Leakage in EEG-Based Survival Prediction: A Two-Stage Embedding and Transformer Framework
Yixin Zhou, Zhixiang Liu, Vladimir I. Zadorozhny, Jonathan Elmer

TL;DR
This paper identifies a hidden data leakage problem in EEG-based survival prediction models and proposes a two-stage framework with strict patient-level separation to improve reliability and generalization.
Contribution
It introduces a leakage-aware two-stage EEG modeling framework that prevents data leakage and enhances model robustness in outcome prediction.
Findings
The proposed framework maintains high sensitivity at strict specificity levels.
Violating patient-level separation inflates validation metrics but degrades test performance.
The method achieves stable, generalizable results on large-scale EEG data.
Abstract
Deep learning models have shown promise in EEG-based outcome prediction for comatose patients after cardiac arrest, but their reliability is often compromised by subtle forms of data leakage. In particular, when long EEG recordings are segmented into short windows and reused across multiple training stages, models may implicitly encode and propagate label information, leading to overly optimistic validation performance and poor generalization. In this study, we identify a previously overlooked form of data leakage in multi-stage EEG modeling pipelines. We demonstrate that violating strict patient-level separation can significantly inflate validation metrics while causing substantial degradation on independent test data. To address this issue, we propose a leakage-aware two-stage framework. In the first stage, short EEG segments are transformed into embedding representations using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
