Seizure Type Classification using EEG signals and Machine Learning:   Setting a benchmark

Subhrajit Roy; Umar Asif; Jianbin Tang; and Stefan Harrer

arXiv:1902.01012·cs.LG·August 13, 2020

Seizure Type Classification using EEG signals and Machine Learning: Setting a benchmark

Subhrajit Roy, Umar Asif, Jianbin Tang, and Stefan Harrer

PDF

1 Repo

TL;DR

This paper demonstrates the effectiveness of machine learning algorithms in classifying seizure types from EEG signals, establishing a benchmark with high accuracy on a large dataset for scalp EEG-based seizure classification.

Contribution

It introduces a comprehensive evaluation of various machine learning techniques and preprocessing methods for multi-class seizure classification using the TUH EEG corpus, setting a new benchmark.

Findings

01

Achieved a weighted F1 score of 0.901 for seizure-wise validation.

02

Achieved a weighted F1 score of 0.561 for patient-wise validation.

03

Provided a thorough search space exploration for optimal model configurations.

Abstract

Accurate classification of seizure types plays a crucial role in the treatment and disease management of epileptic patients. Epileptic seizure types not only impact the choice of drugs but also the range of activities a patient can safely engage in. With recent advances being made towards artificial intelligence enabled automatic seizure detection, the next frontier is the automatic classification of seizure types. On that note, in this paper, we explore the application of machine learning algorithms for multi-class seizure type classification. We used the recently released TUH EEG seizure corpus (V1.4.0 and V1.5.2) and conducted a thorough search space exploration to evaluate the performance of a combination of various pre-processing techniques, machine learning algorithms, and corresponding hyperparameters on this task. We show that our algorithms can reach a weighted $F 1$ score of up…

Tables4

Table 1. TABLE 1 : Seizure Type Statistics for v1.4.0

Seizure Type	Seizure Number	Duration (Seconds)	Patient Number
Focal Non-Specific (FNSZ)	992	73466	109
Generalized Non-Specific (GNSZ)	415	34348	44
Complex Partial (CPSZ)	342	33088	34
Absence (ABSZ)	99	852	13
Tonic (TNSZ)	67	1271	2
Tonic Clonic (TCSZ)	50	5630	11
Simple Partial (SPSZ)	44	1534	2
Myoclonic (MYSZ)	3	1312	2

Table 2. TABLE 2 : Seizure Type Statistics for v1.5.2

Seizure Type	Seizure Number	Duration (Seconds)	Patient Number
Focal Non-Specific (FNSZ)	1836	121139	150
Generalized Non-Specific (GNSZ)	583	59717	81
Complex Partial (CPSZ)	367	36321	41
Absence (ABSZ)	99	852	12
Tonic (TNSZ)	62	1204	3
Tonic Clonic (TCSZ)	48	5548	14
Simple Partial (SPSZ)	52	2146	3
Myoclonic (MYSZ)	3	1312	2

Table 3. TABLE 3 : V1.4.0 5-fold seizure-wise cross-validation results on the four top performing hyperparameter sets for each pre-processing method.

	$f_{m a x}$	$W_{l}$	$O$	$k - N N$	$S G D$	$X G B o o s t$	$C N N$
Method 1	$48$	$1$	$0.75 W_{l}$	0.884	$0.695$	$0.817$	$0.714$
	$24$	$1$	$0.75 W_{l}$	$0.883$	$0.621$	$0.844$	$0.722$
	$96$	$1$	$0.75 W_{l}$	$0.880$	$0.724$	$0.745$	$0.718$
	$24$	$1$	$0.5 W_{l}$	$0.879$	$0.604$	$0.766$	$0.713$
Method 2	$48$	$1$	$0.75 W_{l}$	0.901	$0.807$	$0.851$	$N A$
	$24$	$1$	$0.75 W_{l}$	$0.900$	$0.783$	$0.858$	$N A$
	$24$	$1$	$0.5 W_{l}$	$0.895$	$0.752$	$0.819$	$N A$
	$96$	$1$	$0.75 W_{l}$	$0.890$	$0.806$	$0.866$	$N A$

Table 4. TABLE 4 : V1.5.2 3-fold patient-wise cross-validation results on the four top performing hyperparameter sets for each pre-processing method.

	$f_{m a x}$	$W_{l}$	$O$	$k - N N$	$S G D$	$X G B o o s t$	$C N N$
Method 1	$96$	$1$	$0.75 W_{l}$	$0.466$	$0.432$	0.561	$0.524$
	$24$	$1$	$0.75 W_{l}$	$0.437$	$0.384$	$0.559$	$0.530$
	$48$	$1$	$0.75 W_{l}$	$0.467$	$0.407$	$0.526$	$0.525$
	$24$	$1$	$0.5 W_{l}$	$0.423$	$0.390$	$0.512$	$0.504$
Method 2	$48$	$1$	$0.75 W_{l}$	$0.401$	$0.469$	0.542	$N A$
	$96$	$1$	$0.75 W_{l}$	$0.418$	$0.459$	$0.535$	$N A$
	$24$	$1$	$0.5 W_{l}$	$0.392$	$0.452$	$0.530$	$N A$
	$24$	$1$	$0.75 W_{l}$	$0.412$	$0.462$	$0.524$	$N A$

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/seizure-type-classification-tuh
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Seizure Type Classification using EEG signals and Machine Learning: Setting a benchmark

Subhrajit Roy1, Umar Asif2, Jianbin Tang2 and Stefan Harrer2

IBM Research Australia, now with Google Health, London, UK
IBM Research Australia, Melbourne, VIC, AU

[email protected], {umarasif, jbtang, sharrer}@au1.ibm.com

Abstract

Accurate classification of seizure types plays a crucial role in the treatment and disease management of epileptic patients. Epileptic seizure types not only impact the choice of drugs but also the range of activities a patient can safely engage in. With recent advances being made towards artificial intelligence enabled automatic seizure detection, the next frontier is the automatic classification of seizure types. On that note, in this paper, we explore the application of machine learning algorithms for multi-class seizure type classification. We used the recently released TUH EEG seizure corpus (V1.4.0 and V1.5.2) and conducted a thorough search space exploration to evaluate the performance of a combination of various pre-processing techniques, machine learning algorithms, and corresponding hyperparameters on this task. We show that our algorithms can reach a weighted $F1$ score of up to 0.901 for seizure-wise cross validation and 0.561 for patient-wise cross validation thereby setting a benchmark for scalp EEG based multi-class seizure type classification.

keywords: Seizure type classification, Machine learning, Electroencephalography

I. Introduction

Despite many new advances in drug therapy and disease understanding, our capabilities in treating and managing epilepsy are extremely limited. Roughly 1% of the world’s population, 65 million people, suffer from epilepsy [1]. For one third of these patients, no medical treatment options exist. These patients need to find ways to live with their condition and manage their daily lives around it. For the remaining two thirds of the patient population, medical treatment options are available but have vastly differing and constantly changing results and quality of treatment. These shortcomings in diagnosis and treatment options are caused by the fact that epilepsy is a highly individualized condition, i.e. it does not look the same in all patients and even for an individual patient disease expression changes over time. As a result, until recently, the lack of data and measurements made the correct matching of patients and drugs into an unnecessary, long process of trial and error. Manual diaries are the basic data source, but these have been proven to be only 50% accurate.

With the advent of mobile devices that allow to collect patient information in real-time, continuously and at the point of sensing, and leveraging miniaturization and IoT data collection platforms, new efforts are being directed towards building individualized patient management systems. Data that is more accurate and more extensive can be used to gain a patient specific understanding of the disease and provide support for decision-making in managing it.

Machine Learning has been successfully used to address a large variety of problems in the biomedical field, ranging from image classification in cancer diagnosis to the automatic interpretation of electronic health records. Recently, we reported results demonstrating feasibility of using specialized neural networks to classify EEG data into normal/abnormal EEG [2] and to automatically detect and predict seizures [3]. In this paper, we expand on this work and discuss the feasibility of using machine learning algorithms for automatically distinguishing between different types of seizures as they are detected. This technology could support automatic, patient-specific seizure type logging in digital seizure diaries. Such seizure diaries could then be used to improve the performance of clinical trials through more efficient and reliable patient monitoring for endpoint detection, adherence control and patient retention [4].

II. Datasets

We used the TUH EEG Seizure Corpus (TUSZ) [5], which is the largest open source corpus of its type. This dataset includes the time of occurrence and type of each seizure.

The dataset covers a total of 8 different types of seizures: Focal Non-Specific Seizure (FNSZ): Focal seizures not further specified by type; Generalized Non-Specific Seizure (GNSZ): Generalized seizures not further classified into one of the groups below; Simple Partial Seizure (SPSZ): Partial seizures during consciousness; Type specified by clinical signs only; Complex Partial Seizure (CPSZ): Partial Seizures during unconsciousness; Type specified by clinical signs only; Absence Seizure (ABSZ): Absence Discharges observed on EEG; patient loses consciousness for few seconds (Petit Mal); Tonic Seizure (TNSZ): Stiffening of body during seizure (EEG effects disappear); Tonic Clonic Seizure (TCSZ) : At first stiffening and then jerking of body (Grand Mal) and Myoclonic Seizure (MYSZ): Myoclonus jerks of limbs.

v1.4.0 of the dataset released in Oct 2018 contains 2012 seizures as shown in Table 1. v1.5.2 of the dataset released in May 2020 contains 3050 seizures as shown in Table 2. Since the number of MYSZ samples was too low for statistically meaningful analysis, we did not include MYSZ seizures in our study hence making it a 7-class classification problem.

III. Methods

In this section, we briefly discuss the data preparation strategies, pre-processing techniques, machine learning algorithms and hyperparameter tuning methodologies we have explored.

For pre-processing the dataset, we used two-popular methods which have been reported to be effective in analysing EEG signals [6, 7]. In Method 1, we applied Fast Fourier Transform (FFT) to each $W_{l}$ seconds of clip having $O$ seconds overlap across all EEG channels. Next, we took $log_{10}()$ of the magnitudes of frequencies in the range $1-f_{max}$ Hz. After this operation, the dimension of each training sample becomes $(N,47)$ where $N$ is the number of EEG channels. For Method 2, first FFT is applied to each $W_{l}$ seconds of clip having $O$ seconds overlap across all EEG channels. Next, the output of FFT is then clipped from 1 to $f_{max}$ Hz and normalized across frequency buckets. The correlation coefficients $(N,N$ ) matrix is calculated from this normalized matrix of $(N,47)$ . Real eigenvalues are calculated on this correlation coefficients matrix with complex eigenvalues made real by taking the complex magnitude. We only considered the upper right triangle of the $(N,N)$ correlation coefficients matrix (since it is symmetric) and sorted the eigenvalues by magnitude.

For classification, we used the following algorithms: k-Nearest Neighbors (k-NN), Stochastic Gradient Descent (SGD), XGBoost, and Convolutional Neural Networks (CNN). For the first three algorithms, we used HyperOpt [8] to choose the best hyperparameters. For CNN models, we used the popular ResNet50 [9] model and retrained the final layer for this task.

For cross validation, in v1.4.0, TNSZ and SPSZ classes only contain data from 2 patients therefore, patient-wise cross validation will not yield statistically meaningful results. Hence previous work in the field [10, 11] chose to apply 5-fold seizure-wise cross validation, in which the seizures from different seizure types will be equally and randomly allocated to 5 folds. In this scenario train and test datasets can contain different seizure samples from the same patient. Since v1.4.0 version of the dataset has been used for evaluation studies by multiple researchers [10, 11] we also include baseline results of our methods for v1.4.0 to allow a direct performance comparison to these studies. In v1.5.2 of the dataset all 7 selected seizure types comprise data from 3 or more patients, which allows statistically meaningful 3-fold patient-wise cross validation. In this scenario, train and test datasets will always contain seizure samples from different patients. This approach makes it more challenging to boost model performance but has higher clinical relevance as it supports model generalisation across patients. For each seizure type, we randomly and equally allocate patients into each fold. We started with seizure types covering less patients and moved on to seizure types carried by more patients. For each seizure type, we exclude patients allocated to previous seizure types. Since datasets of individual patients comprise a different number of seizures, each fold’s seizure number can vary largely. Hence we also investigated the impact of selecting different random seeds on the total number of seizures per fold and found that this had essentially no effect on the seizure number for each fold which varied only by plus-minus 3 seizures.

To the best of our knowledge, this is the first seizure type classification study that provides a performance baseline for patient-wise cross validation.

IV. Experiments and Results

As a first step, to explore the design space in an efficient manner, we chose the two computationally fastest classifiers from Sec. III namely k-NN and SGD classifier and generated their weighted-F1 scores using both pre-processing methods for the first cross validation split. For $f_{max}$ , $W_{l}$ , and $O$ i.e. the pre-processing hyperparameters, we generated results for all combinations of $f_{max}$ = {12, 24, 48, 64, 96} Hz, $W_{l}$ = {1, 2, 4, 8, 16} secs, and $O$ = {0.5 $W_{l}$ , 0.75 $W_{l}$ } secs. The best hyperparameters of k-NN and SGD for each combination were automatically discovered by running Hyperopt for 100 iterations. Note that due to the heavy imbalance of the dataset we used weighted-F1 score as scoring metric.

The above experiment served two purposes. Firstly, it allowed us to understand how the performance of the system varies with $f_{max}$ , $W_{l}$ and $O$ separately which is shown in Fig. 1. Upon inspecting the top row of Fig. 1, we find that while the performance is higher at mid- $f_{max}$ of 24 and 48 Hz, it drops at extreme frequencies. This probably happens since at lower $f_{max}$ we lose relevant information [12] and at higher $f_{max}$ the number of dimensions increases, and the classifiers suffer from the curse of dimensionality. The second and third row of Fig. 1 suggest that the performance increases when $W_{l}$ decreases and $O$ increases respectively. We speculate that this happens since both the decrease of $W_{l}$ and increase of $O$ lead to more samples in the training set.

Secondly, this design space exploration using simple classifiers revealed which combination of hyperparameters works best for both pre-processing methods. We select the top four performing sets of hyperparameters and perform 5-fold cross-validation on all the classifiers. Note that CNNs cannot be used to process the data from pre-processing method 2 as it does not produce 2D data. As before, hyperparameters have been chosen by running Hyperopt for 100 iterations. Table 3 shows the obtained average weighted-F1 scores for both pre-processing methods. It is evident that the best performing models were k-NN achieving a weighted-F1 score of $0.901$ for v1.4.0 and XGBoost reaching 0.561 for v1.5.2.

The results shown in Table 3 and Table 4 depict that automated seizure type classification is possible using machine learning. $k-NN$ and $XGBoost$ are the best performing algorithms for analysing V1.4.0, and V1.5.2 respectively. We speculate that since V1.5.2 has more seizures compared to V1.4.0 this leads to more training samples and hence paves the path for a more complex algorithm to excel.

Automated detection and classification of seizures is the first step towards building a digital seizure diary which could enable the recording of patient-wise seizure metadata during clinical trials and in epilepsy monitoring units. Such information could then be used to tailor a patient-specific seizure suppression system using optimum medication dosages and suitable medical devices. The methods described in this paper may play an important role for building digital seizure diary technology in the future.

V. Conclusion

In this article, we performed the first exploratory study to show that machine learning techniques can be used to classify the type of detected seizures. We hope that automatic classification of seizure types will improve long-term patient care, enabling timely drug adjustments and remote monitoring. To promote research in this topic, we have released our pre-processed datasets for v1.4.0 [13] and intend to release pre-processed datasets for v1.5.2 and the code we used to generate the presented results.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Patty Obsorne Shafer RN, “About Epilepsy: The Basics,” https://www.epilepsy.com/learn/about-epilepsy-basics , 2014.
2[2] S. Roy, I. Kiral-Kornek, and S. Harrer, “Deep learning enabled automatic abnormal eeg identification,” 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) . IEEE, 2018, pp. 2756–2759.
3[3] I. Kiral-Kornek, S. Roy, E. Nurse, B. Mashford, P. Karoly, T. Carroll, D. Payne, S. Saha, S. Baldassano, T. O’Brien et al. , “Epileptic seizure prediction using big data and deep learning: toward a mobile system,” E Bio Medicine , vol. 27, pp. 103–111, 2018.
4[4] S. Harrer, B. Antony, P. Shah, and J. Hu, “Artificial intelligence for clinical trial design,” Trends in Pharmacological Sciences , vol. 40, no. 8, pp. 577–591, 2019.
5[5] V. Shah, E. Von Weltin, S. Lopez, J. R. Mc Hugh, L. Veloso, M. Golmohammadi, I. Obeid, and J. Picone, “The temple university hospital seizure detection corpus,” Frontiers in neuroinformatics , vol. 12, p. 83, 2018.
6[6] Y. Paul, “Various epileptic seizure detection techniques using biomedical signals: a review,” Brain informatics , vol. 5, no. 2, p. 6, 2018.
7[7] K. Schindler, H. Leung, C. E. Elger, and K. Lehnertz, “Assessing seizure dynamics by analysing the correlation structure of multichannel intracranial eeg,” Brain , vol. 130, no. 1, pp. 65–77, 2007.
8[8] J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, and D. D. Cox, “Hyperopt: a python library for model selection and hyperparameter optimization,” Computational Science & Discovery , vol. 8, no. 1, p. 014008, 2015.