Real-Time EEG Classification via Coresets for BCI Applications

Eitan Netzer; Alex Frid; Dan Feldman

arXiv:1901.00512·cs.DS·July 27, 2020

Real-Time EEG Classification via Coresets for BCI Applications

Eitan Netzer, Alex Frid, Dan Feldman

PDF

TL;DR

This paper introduces a novel coreset-based processing pipeline enabling real-time, high-quality EEG classification for BCI applications by efficiently summarizing streaming data and maintaining continuous learning.

Contribution

It proposes a coreset algorithm that allows real-time EEG feature extraction and classifier training, improving efficiency while preserving accuracy, applicable to BCI systems.

Findings

01

Real-time EEG signal learning demonstrated with 64 channels

02

Coreset-based CSP feature extraction achieves efficiency and accuracy

03

Open-source implementation provided for reproducibility

Abstract

A brain-computer interface (BCI) based on the motor imagery (MI) paradigm translates one's motor intention into a control signal by classifying the Electroencephalogram (EEG) signal of different tasks. However, most existing systems either (i) use a high-quality algorithm to train the data off-line and run only classification in real-time, since the off-line algorithm is too slow, or (ii) use low-quality heuristics that are sufficiently fast for real-time training but introduces relatively large classification error. In this work, we propose a novel processing pipeline that allows real-time and parallel learning of EEG signals using high-quality but possibly inefficient algorithms. This is done by forging a link between BCI and core-sets, a technique that originated in computational geometry for handling streaming data via data summarization. We suggest an algorithm that maintains the…

Tables3

Table 1. Table 1: Complexity bound, Coreset versus Traditional CSP

Algorithm / Complexity	Time	Space
Traditional CSP	$O (d^{2} (t_{1} + t_{2} + d))$	$O (d (t_{1} + t_{2}))$
Coreset CSP	$O (d^{2})$	$O (d^{2})$

Table 2. Table 2: Classification results - averaged across all participants

	mean	std
Coreset CSP + LDA	74.9%	14.1%
Traditional CSP + LDA	72.9%	13.2%

Table 3. Table 3: Classification results - averaged across all participants

True Label	Left	Right
Left	0.751	0.249
Right	0.294	0.706
Predicted Label	Left	Right

Equations8

w \in w ar g max \frac{∥ w X _{1} ∥ ^{2}}{∥ w X _{2} ∥ ^{2}}

w \in w ar g max \frac{∥ w X _{1} ∥ ^{2}}{∥ w X _{2} ∥ ^{2}}

w = w argmax \frac{∥ w C _{1, t} ∥ ^{2}}{∥ w C _{2, t} ∥ ^{2}}

w = w argmax \frac{∥ w C _{1, t} ∥ ^{2}}{∥ w C _{2, t} ∥ ^{2}}

R_{i} = (U_{i} S_{i} V_{i}) (U_{i} S_{i} V_{i})^{T} = U_{i} S_{i} V_{i} V_{i}^{T} S_{i}^{T} U_{i}^{T} = U_{i} S_{i}^{2} U_{i}^{T}

R_{i} = (U_{i} S_{i} V_{i}) (U_{i} S_{i} V_{i})^{T} = U_{i} S_{i} V_{i} V_{i}^{T} S_{i}^{T} U_{i}^{T} = U_{i} S_{i}^{2} U_{i}^{T}

R_{2}^{- 1} R_{1}^{2} = U_{2}^{T} S_{2}^{- 2} U_{2}^{T} U_{1} S_{1}^{2} U_{1}^{T}^{2}

R_{2}^{- 1} R_{1}^{2} = U_{2}^{T} S_{2}^{- 2} U_{2}^{T} U_{1} S_{1}^{2} U_{1}^{T}^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Real-Time EEG Classification via Coresets for BCI Applications

Eitan Netzer

Alex Frid

[email protected]

Dan Feldman

Robotics and Big Data Lab, Computer Science department, University of Haifa, Haifa, Israel.

Laboratory of Clinical Neurophysiology, Faculty of Medicine, Technion IIT, Haifa, Israel.

Abstract

A brain-computer interface (BCI) based on the motor imagery (MI) paradigm translates one’s motor intention into a control signal by classifying the Electroencephalogram (EEG) signal of different tasks. However, most existing systems either (i) use a high-quality algorithm to train the data off-line and run only classification in real-time, since the off-line algorithm is too slow, or (ii) use low-quality heuristics that are sufficiently fast for real-time training but introduces relatively large classification error.

In this work, we propose a novel processing pipeline that allows real-time and parallel learning of EEG signals using high-quality but possibly inefficient algorithms. This is done by forging a link between BCI and core-sets, a technique that originated in computational geometry for handling streaming data via data summarization.

We suggest an algorithm that maintains the representation such coreset tailored to handle the EEG signal which enables: (i) real time and continuous computation of the Common Spatial Pattern (CSP) feature extraction method on a coreset representation of the signal (instead on the signal itself) , (ii) improvement of the CSP algorithm efficiency with provable guarantees by applying CSP algorithm on the coreset, and (iii) real time addition of the data trials (EEG data windows) to the coreset.

For simplicity, we focus on the CSP algorithm, which is a classic algorithm. Nevertheless, we expect that our coreset will be extended to other algorithms in future papers. In the experimental results we show that our system can indeed learn EEG signals in real-time for example a 64 channels setup with hundreds of time samples per second. Full open source is provided to reproduce the experiment and in the hope that it will be used and extended to more coresets and BCI applications in the future.

keywords:

Machine Learning , Coreset , Data Structures , On-line learning , Electroencephalogram (EEG) , Brain Computer Interface (BCI)

††journal: Engineering Applications of Artificial Intelligence

1 Introduction

Brain-computer interfaces (BCI’s) translate brain signals into a control signal without using one’s actual movements or peripheral nerves. BCI’s based on Electroencephalogram (EEG) recordings have many advantages, such as short time constraints, less environmental limits, and the requirement of relatively inexpensive equipment. On the other hand, EEG introduces a high amount of noise and requires handling a large amount of data in real-time. In addition, those systems usually require a time consuming training phase.

In recent years many techniques were developed for the EEG-MI based BCI systems which introduced high classification accuracy. For example, the average accuracy of classifying imaginary left and right hand movement in some cases can achieve more than 90% [1, 2]. Many of these techniques are based on Common Spatial Patterns (CSP) signal decomposition [3, 4, 5, 6, 7]. Nevertheless, these systems still have very limited usage in real-life applications. This is partially because the algorithms used in those systems focus on analysing multi-channelled and densely sampled EEG signal, which requires relatively expansive equipment due to the need of high processing power and memory.

An example of such computational bottlenecks in these systems is the CSP algorithm [8]. In essence it is a ”batch processing algorithm”, i.e. whenever a new sample is introduced, the algorithm needs to be re-trained in order to find the updated spatial filters.

In this work, we present a method that is based on coreset representation [9] of the EEG signal that can be executed prior to the CSP signal decomposition (see visualization in figure 1 ), and thus can reduce both the computational cost and memory consumption without losing classification accuracy. Which in turn will allow to use cheaper and low powered hardware in BCI devices.

1.1 Background

The method of common spatial pattern was first used in EEG analysis to extract abnormal components from the clinical EEG [10] and in later stages, it was adopted to BCI applications. In essence, this method weights each electrode according to its importance for the discrimination task and suppresses noise in individual channels by using correlations between neighboring electrodes. Let $X_{1}\in\mathbb{R}^{d\times t_{1}}$ , and $X_{2}\in\mathbb{R}^{d\times t_{2}}$ be multivariate signals of degree $d$ , where $d$ is the number electrodes or sensors and $t_{i}$ is the number of time samples. For example in MI $X_{1}$ , and $X_{2}$ may represent the signal associated with subject imagining of moving his left or right hand. CSP determines for every $w\in\mathbb{R}^{d}$ , such that non zero ${\left\|wX_{2}\right\|^{2}}$ , the component $w^{T}$ that maximizes the ratio of variance between $X_{1}$ and $X_{2}$ [5]:

[TABLE]

In order to solve the aforementioned problem[11], the CSP algorithm first computes the covariance matrices $R_{i}=\frac{X_{i}X_{i}^{T}}{t_{i}}$ where $i={1,2}$ , then simultaneous diagonalization of both matrices ( $R_{2}^{-1}R_{1}$ ) using generalized eigenvalue decomposition is performed. Since $X_{i}$ is of degree $d$ $R_{i}$ is of full degree and invertible. Let $U$ be eigenvectors matrix and, $D$ a diagonal matrix of eigenvalues $\left\{\lambda_{1},\lambda_{2},...,\lambda_{d}\right\}$ in decreasing order, such that $U^{-1}R_{1}U=D$ and $U^{-1}R_{2}U=I_{d}$ where $I_{d}$ is the identity matrix. $R_{2}^{-1}R_{1}=UU^{-1}UDU^{-1}=UDU^{-1}$ hence its equivalent to the eigendecomposition of $R_{2}^{-1}R_{1}$ , and $w^{T}$ correspond to the first column of $U$ . A more detailed description provided in the Algorithm 1 below.

The main drawback for using this algorithm in real time applications lays in its time and space complexity. Indeed computing the covariance matrices (step 1 in the Algorithm 1) is an $O\left(d^{2}\left(t_{1}+t_{2}\right)\right)$ time complexity, followed by inverting a $d\times d$ matrix which takes $O\left(d^{3}\right)$ time complexity and then finding eigenvalues, and eigenvectors with $O\left(d^{2}\right)$ . The total time complexity is thus $O\left(d^{2}\left(t_{1}+t_{2}+d\right)\right)$ , and require space (memory) complexity of $O\left(d\left(t_{1}+t_{2}\right)\right)$ , when typically $d<<t_{i}$ (due to the EEG’s high sampling rate and it’s continuous operation). This dependency on time samples eliminates the possibility of using CSP in real time streaming. Our coreset based algorithm has a fixed time and space complexity of $O\left(d^{2}\right)$ , allowing real time streaming applications per new added sample.

Past attempts have been made to reduce this computational cost. For example, [12] proposed an incremental way to update the spatial filters where new sample introduced to the algorithm, i.e. the feature extraction process performed is in on-line fashion. In [13] it was proposed to preform incremental learning algorithm that is based on incremental algorithms for principal component analysis with a forgetting factor. This is essentially an adaptation of the algorithm presented in [14]. Suppose $X\in\mathbb{R}^{d\times t}$ represent a set of sample points, where $d$ is the number of features and $t$ is the number of samples and $q\in{Q}$ a quire from a set queries or family of models to optimize equation (1).

Nevertheless, those methods have one or all of the following disadvantages:

The Running time that it takes to minimize $f(X,q)$ might be impractical. A possible solution is to use faster heuristics with no provable approximations, but the cost might be a weaker classifier. In the context of EEG, in many applications the signal from the brain received in real-time and the model must be updated in a fraction of a second. 2. 2.

Memory management issues arise for large signals that cannot fit into memory (RAM), or can fit into memory but are too large to be processed by the optimization algorithm. In the context of EEG, the memory (signal’s length) increases because of the many channels, the frequency of the sampling, and possibly the number of users. 3. 3.

On-line on the-fly data points that are received from the signals are classified but are not being used to update the model (i.e. learning) to improve classification over time. In the context of EEG, we might get feedback from the user or the real-world regarding the last classification and we wish to use the new information for next samples. 4. 4.

Parallel computation. Even if the algorithm is sufficiently enough, it may not be clear on distributed data how to run it in parallel over few parallel computation threads to reduce its running time and take the advantage of the computation power that can be used by modern multi-core CPU or GPUs. In the context of EEG, the input is also parallel when it is received from either few users or few BCI channels of the same user. 5. 5.

**Distributed computation. **Even if the algorithm supports parallel computation, it may not support distributed computation. Here, the input signal itself is partitioned between different machines (cloud, device) or threads as in GPUs, that have no shared memory and little communication data between them might be expensive. In the context of EEG, each user might be connected to a different computation device, but we aim for a single classifier. Similarly, when the signal is streamed to a cloud of machines, we need parallel computation with little shared memory via network communication which is relatively expensive and slow. 6. 6.

**Dynamic data. **Even on-line or parallel algorithms usually are not able to handle deletion of samples in the signals. In the context of EEG, this is the case when we use the sliding window model. Here, we wish the classifier to represent only samples from the last $t$ seconds. That is, every time that a new sample point arrives, the sample that was received $t$ seconds ago should be deleted. The classifier is then the one that minimizes $f(X,q)$ above where $X$ is only the set of remaining samples. 7. 7.

Handling Variations and Constrains. In practice and industry we usually have constraints, business rules and expert rules that are based on very specific scenarios, signals, laws, users or applications. For example, we want to minimize $f(X,q)+\left\lVert q\right\rVert$ instead of $f(X,q)$ , where $\left\lVert q\right\rVert$ is called a regularization term that is used to obtain a simpler classifier, or we want to minimize $f(q)$ but under some specific constraints where $q\in Q^{\prime}\subset Q$ .

In this work, we present an alternative approach for optimizing the CSP algorithm learning and its variants by using coreset representation of the data. This in turn satisfies the aforementioned requirements, i.e. allows on-line learning, constant memory consumption, computational efficiency, parallel and distributed computation, while allowing not only addition but also a deletion of the data. The rest of the paper is organized as follows: in Section 2, an introduction to coresets and a formulation of the corsets for EEG is provided. Section 3 presents the proposed algorithm and it’s analysis and theorems. Section 4 compares the practical performance of coreset-based CSP algorithm with a traditional algorithm on well known EEG dataset. Finally the last section, Section 5, summarizes and concludes the our work.

2 Related work: Coresets for EEG real-time processing

The term coreset was coined by Agarwal, Har-Peled and Varadarajan in [9]. First, coresets improved the running of many open problems in computational geometry (e.g. [15, 16, 17, 18]); See surveys in [19, 20, 21]. Later, coresets were designed for obtaining the first PTAS or LTAS (polynomial/linear time approximation schemes) for more classic and graph problems in theoretical computer science [22, 23, 24, 25], and more recently under the name ”composable coresets” [26, 27, 28]. Coresets are usually used when we want to approximate large data by a simple model with relatively few parameters, and are used less for real-time systems as in this paper. In particular, in projective clustering [29, 15, 30, 31, 32, 33, 34, 35, 36] the model is a set of $k$ points, lines or subspaces, with an appropriate fitting cost. This is also a common setting in machine learning [37, 38, 39, 40, 41, 42, 43, 44, 45, 46]. More applied research was suggested e.g. by Rus et al. [37, 47, 48, 49, 45] Krause [38, 43, 40, 41] , Smola [46] or Sochen [50, 51, 52] in image processing with applications for medicine.

Improved techniques for using coresets for distributed data and low communication on the cloud, with both theoretical guarantees and experimental results were recently suggested in data mining conferences such as [53, 54]. Classical optimization techniques such as Frank-Wolfe [55] and semi-definite programming [56] appear to produce deterministic and smaller types of coresets. In Numerical linear algebra coresets were suggested for matrix approximations [57, 58, 59] using random projections, called sketches. The first coresets for signal processing with applications to GPS or video data were suggested in [48, 45, 47]. The first results for probabilistic databases appeared recently [60, 61]

In this work we show that coreset paradigm can improve the computations described different sections, including provable guarantees of complexity in terms of training/inference time and memory, also for EEG applications. In particular, we demonstrate how coresets can be used to train a classifier in real-time for EEG signals. More details on coresets and the theoretical proofs for the computation models below can be found e.g. in [62, 19, 63, 53].

As in the previous section, consider the problem of minimizing $f(X,q)$ over $q\in Q$ , where $Q$ is a set of query and $f$ is a function $f(X,q)\rightarrow\mathbb{R^{+}}$ . In this paper a coreset for this optimization problem, as in the coreset for CSP, would be another set $C$ such that $f(C,q)=f(X,q)$ for every $q\in Q$ .

The fact that the coreset approximates the original data in the above sense is not sufficient to handle the computation models in the previous sections. What we need for these is a composable coresets construction. This means that the coreset construction satisfies two properties. First, the union of two coresets is a coreset. That is coresets are mergable in the sense that if $f(C_{a},q)$ approximates $f(X_{a},q)$ and $f(C_{b},q)$ approximates $f(X_{b},q)$ then $f(C_{a}\cup C_{b},q)$ approximates $f(X_{a}\cup X_{b},q)$ , for every $q\in Q$ . The second property is that we can compress a pair of coresets to obtain another coreset. Formally, we can compute a coreset $C^{\prime}$ such that $f(C^{\prime},q)$ approximates $f(C_{a}\cup C_{b},q)$ for every $q\in Q$ . Using these construction properties, we can build a coreset tree (see 2 for an example of such tree).

Running time

Let $s\geq 1$ be an integer so that if the input set $X$ to the coreset construction algorithm is of size $|X|\leq s$ , then the resulting coreset is of cardinality $|C|\leq s/2$ . Assume that this construction takes $g(s)$ time. As is shown in Fig. 1, we can now merge-and-reduce a data set $X$ of size $n$ recursively to obtain its coreset under the above models. That is, we partition the input signal into subsets of samples, each of size $s/2$ (the leaves of the binary tree). In the next level of the tree we take every union of coresets in a pair of leaves (that consists of $s$ points) and reduce them back to $s$ points. This takes overall $(2n/s)\cdot g(s)$ time for the $n/s$ leaves, which is linear in $n$ even if our coreset constructions takes, say $g(n)=n^{10}$ time for input of $|X|=n$ points.

Streaming data

Handling streaming data can be done in a similar way, where the leaves arrive on-the-fly. Every set in a pair of leaves is reduced to a single coreset and the pair of leaves are then deleted. Hence, there are no more than one pair of leaves during the streaming. Similarly, whenever there are two inner nodes in a level of the tree we reduce them to a single node in the higher level. At any given moment there is at most a single coreset in each of the $O(\log n)$ levels of the tree for the $n$ points seen so far in the stream.

Distributed data

When the data is both streamed and distributed, say, to $M=2$ machines, we assume that every second point is being sent to the second machine, and the rest (odd) points are being sent to the first machine. This can be done directly from the users, or from a main server. Each machine is then independently computing the merge-and-reduce tree of its points, as explained in the previous paragraph. The speed of computation and streaming then grows linearly with $M$ . This is known as “embarrassingly parallel” independent computation. When a coreset for the complete data is needed, a main server can collect the coreset of each tree on each machine. This requires communication to a main server, however, since each machine sends only the coreset of its data, only $O(s)$ bits are sent in parallel from the $M$ machines.

Dynamic computations

To support deletion of input points (as in the sliding window model above), we need to store the complete merge-and-reduce tree as a balanced 2-3 binary tree whose leaves are the input points (a single point to a leaf). Here, every inner node, which is a root of a sub-tree, contains the coreset for the leaves in this sub-tree. When a point (leaf) is being deleted, we only need to update its $O(\log n)$ ancestors in the tree with their corresponding coresets. Recomputing these coresets takes $f(s)\cdot O(\log n)$ time per point insertion/deletion, which is only logarithmic in $n$ , the number of points seen so far.

Real Time Training as a feedback

Our coreset allows real time training of the model. Brain related signals such as EEG are generated by a user. Using the system reaction of an updated (a.k.a inference) allows that not only the system will ”learn” the human participator but the user will learn the system. This allows the user to aim his thought best such that to control the system, shortening and updating training phase.

3 Proposed Algorithm

A full description of the algorithm flow is provided in this section, along with its graphical illustration (see Fig. 2). As can be seen in Fig. 2, the input signal is streamed out either from the database (i.e. a real-time simulation) or from EEG headset. After acquiring the data, the EEG signal undergoes preprocessing stage, during which (i) various signal artifacts are checked (such as eye blinking and loosed electrodes) and then (ii) the signal is band-passed to frequencies containing MI information. In the next step, a coreset is fitted to the EEG data, which leads to a more compact representation of the signal.

On this compact representation, the CSP algorithm is applied to find the discriminative spatial filters. Next, the last step of the algorithm is the classification of the MI task type (i.e. left or right hand movement). This step is performed using Linear Discriminant Analysis (LDA) algorithm.

Each of these steps is described in detail below:

The EEG signal $X_{i,t}$ at time $t$ of each class $i\in\left\{0,1\right\}$ is represented using a coreset in the following way:

For each new time sample (or a processing window), the coreset compress the signal to be bounded by the number of electrode leads (i.e. sensors). When a new time sample is entered into the system, the current signal is represented by $d\times\left(d+1\right)$ matrix. In order to compress it back to $d\times\left(d+1\right)$ , first an SVD matrix decomposition is applied resulting in $U,S,V=svd\left(X_{i,t}\right)$ where $S$ is a diagonal eigenvalue matrix and $U,V$ are matrices whose columns are the singular vectors.

Let $Y=US$ , then $\left\|X\right\|^{2}=\left\|YV\right\|^{2}=\left(YV\right)\left(YV\right)^{T}=YVV^{T}Y^{T}=\left\|U\cdot S\right\|^{2}$ . Then, we update the coreset representation of the signal to be $C_{i,t}=U\cdot S$ , an eigenvalue $d\times d$ matrix, and repeat this procedure for each incoming sample (or a processing window). The time complexity of adding a sample to the coreset is $O\left(d^{3}\right)$ and space complexity is $O\left(d^{2}\right)$ , when typically $d$ is very small. See Algorithm 2.

Additionally we show that it is possible to concatenate a window of samples to a coreset.

Lemma 1.

For every $w\in\mathbb{R}^{d}:\\ \begin{Vmatrix}w\begin{bmatrix}U\cdot{S}\\ x_{n+1}\end{bmatrix}\end{Vmatrix}^{2}=\begin{Vmatrix}w\begin{bmatrix}U\cdot{S}\\ x_{n+1}\end{bmatrix}\begin{bmatrix}V&0\\ 0&1\end{bmatrix}\end{Vmatrix}^{2}=\begin{Vmatrix}w\begin{bmatrix}U\cdot{S}\cdot{V}\\ x_{n+1}\end{bmatrix}\end{Vmatrix}^{2}=\begin{Vmatrix}w\begin{bmatrix}X_{1,2,...,n}\\ x_{n+1}\end{bmatrix}\end{Vmatrix}^{2}$ , where $\begin{bmatrix}X_{1,2,...,n}\\ x_{n+1}\end{bmatrix}$ is the concatenation of matrix $X_{1,2,...,n}\in\mathbb{R}^{d\times n}$ of samples $1$ to $n$ with the vector $x_{n+1}$ of sample $n+1$ , $S\in\mathbb{R}^{d\times d}$ diagonal matrix of the eigenvalues, $V\in\mathbb{R}^{d\times d}$ matrix of eigenvectors, and $\begin{bmatrix}U\cdot{S}\\ x_{n+1}\end{bmatrix}$ is the concatenation of samples $1$ to $n$ after svd decomposition with with the vector $x_{n+1}$ of sample $n+1$ .

3.1 Common Spatial Patterns and Corsets

The coreset signal $C_{i,t}$ for both signal $i=1,2$ , is used in every new sample to maximize the following equation

[TABLE]

where $w$ is equivalent of Eq. (1) for the real-time process. Using the coreset signal, we are able to compute the CSP component using fewer samples and much faster. For diagonal matrix and real unitary matrix,

[TABLE]

the covariance matrix $R_{1}=U_{1}\cdot S_{1}^{2}\cdot U_{1}^{T}$ , where for $S_{1}^{2}$ we need calculate for the main diagonal since $S_{1}$ is a diagonal matrix, $R_{2}^{-1}=\left(U_{2}\cdot S_{2}^{2}\cdot U_{2}^{T}\right)^{-2}=U_{2}^{T}\cdot S_{2}^{-2}\cdot U_{2}$ , again for $S_{2}^{-2}$ we just calculate for the main diagonal since $S_{2}$ is a diagonal matrix and $V_{2}$ is unitary and real matrix hence $U_{2}^{-1}=U_{2}^{T}$ . The problem is reduced to computing:

[TABLE]

The complexity bound of adding a sample is determined by Algorithm 2. The time complexity of calculating the covariance matrix and inverting is $O\left(d^{2}\right)$ because $S_{i}$ is diagonal matrix and $U_{i}$ is unitary and real matrix. Finding the eigenvalues and eigenvectors takes time $O\left(d^{3}\right)$ , resulting with time complexity of $O\left(d^{3}\right)$ and space complexity of $O\left(d^{2}\right)$ . See Algorithm 3 for additional details. Figure 1 depicts graphically the advantage of using coresets for CSP computation. Each leaf in the graph represents a spatial filter (i.e. ” $w$ ”) computed by the CSP algorithm.

When an additional trial is presented to the system, the traditional CSP computation will require re-computation of all the previous EEG trials along with the new one. In comparison using the ”coreset based CSP” that will only require a unification of two coresets. This is comparison to the traditional approach that will require re-computation of all the data. Additionally, this coreset representation allows parallel computation and removal of trials or even a group of trials from the CSP filters without recomputing the CSP algorithm on the remaining data.

3.2 Classification Scheme

A common classifier for BCI and MI task is the Linear Discriminant Analysis (LDA) method [64]. LDA is a generalization of the ”classical” Fisher’s linear discriminant frequently used in statistical pattern recognition for finding a linear combination of features that separates between classes [65, 66]. If we let $X$ be a feature vector and $y$ a known class label, LDA assumes that the conditional probability density functions $p\left(x\mid y=2\right)$ and $p\left(x\mid y=1\right)$ are normally distributed with mean and covariance parameters $\left(\mu_{1},\varSigma\right)$ and $\left(\mu_{2},\varSigma\right)$ , respectively where $\varSigma$ is Hermitian and invertible. It predicts by using Bayes optimal solution with threshold $T$ with log likelihood ratio, $\varSigma^{-1}\left(\mu_{2}-\mu_{1}\right)x>C$ where $C$ is a constant s.t. $C=\frac{1}{2}\left(T-\mu_{1}^{T}\varSigma^{-1}\mu_{1}+\mu_{2}^{T}\varSigma^{-1}\mu_{2}\right)$ . The prediction part relies on the dot product $wx>C$ where $w=\varSigma^{-1}\left(\mu_{2}-\mu_{1}\right)$ .

4 Results

Hardware: A four core i7 laptop with 8Gb RAM.

Input Data: For our system evaluation, we use the Motor Imaginary right/left hands task from the EEG Motor Movement/Imaginary dataset that was created and contributed to the Physionet [67] by the developers of BCI2000 instrumentation system [68]. The dataset was recorded using 64 electrodes in 10-10 system arrangement, with sampling frequency of 160Hz. The dataset include 109 participants, with about 44 trials (depends on the artifacts rejection process applied) for each MI task.

Pre-Processing: During this step, several sub-routines are applied, as depicted in 2 (see second stage). First, an artifacts rejection process is applied in order to (i) remove eye blinks, (ii) detect loosed/noisy electrodes. Second, the signal is band-passed to 0.5-8Hz in order to focus on delta and theta frequencies, which are known to be related with sensi-motor acivitiy [69]. Evaluation: The goal of the experiment was to compare time, memory consumption and accuracy of traditional CSP versus coreset-based CSP, computed at each time sample. To evaluate our method, we compare the results with traditional CSP-based MI-BCI using the following criterions:

First we show that the $w^{T}$ component in both methods reaches the same solution. 2. 2.

Later, a visualization of the 4 best CSP components is compared. 3. 3.

Then we compare the classification accuracy. 4. 4.

Last, we present the time and memory allocation used by both methods.

In order to show that the $w^{T}$ component reaches to the same solution, we computed the following ratio $\frac{\left\|wX_{1}\right\|^{2}}{\left\|wX_{2}\right\|^{2}}/\frac{\left\|\tilde{w}X_{1}\right\|^{2}}{\left\|\tilde{w}X_{2}\right\|^{2}}$ when $\tilde{w}$ is calculated using ”coreset-based CSP” and $w$ using the traditional (batch) CSP algorithm.

Measurements: The ratio was computed per sample (i.e. to simulate real-time data acquisition conditions), as can be seen in Fig. 4(a), it is noticeable that ratio is stable around the value of 1.

In Fig. 3 we visualize the four largest components computed by each method. The top row shows the coreset-based CSP and at the bottom is the traditional CSP, where highest component is located at the left. It can be seen that the selected CSP weights are the same.

Table 2 shows the classification accuracy based on a single trial across all 109 participants. The first row presents the average classification accuracy based on CSP computed from EEG data approximated using coreset, and second row show the traditional CSP-based algorithm result. It can be seen that the classification result is not damaged by the coreset approximation of the signal. We cross-validated the data using leave-one-out.

Table 3 presents the classification results in more details using the confusion matrix showing the type I and type II errors.

Figure 4(B) presents the computation time and memory (C) consumption of both methods. It can be seen that the coreset-based algorithm is superior in terms of memory allocation straight from the beginning, and maintains the same level with the time (opposed to the traditional CSP based BCI algorithm). In addition, it can be seen that the coreset-based algorithm is computationally efficient and maintains the same computation time without regard to the number of samples from the signal used.

5 Conclusions

We showed that coresets can indeed be used to learn EEG signals in real-time and dynamic data by applying existing algorithms on these coresets. Our theoretical and experimental results demonstrate that this can be done via 64 channels with hundreds of time samples per second, without decreasing the accuracy of the system. Additionally we showed that coreset-based compact representation allows parallel computation and removal (at real time) of bad trials or outliers from the system.

A real time EEG system is valuable for immediate interactive systems such as in neurofeedback settings. Immediate response in a real time EEG system can let the subject learn the system behavior and ”teach” himself how to control his thoughts to improve the system’s output. Such abilities can shorten the training period of EEG tasks and result with better more adaptive or personalized systems.

Additionally, we show that by using coreset-approximation of the EEG signal, a cheaper (with less memory and computational power) or low powered hardware can be used for training and running the BCI systems.

Full open source is provided [70] in the hope that it will be used and extended to more coresets and BCI applications in the future.

References

[1]

H. Ramoser, J. Muller-Gerking, G. Pfurtscheller, Optimal spatial filtering of single trial eeg during imagined hand movement, IEEE transactions on rehabilitation engineering 8 (4) (2000) 441–446.

[2]

G. Dornhege, B. Blankertz, G. Curio, K.-R. Muller, Boosting bit rates in noninvasive eeg single-trial classifications by feature combination and multiclass paradigms, IEEE Transactions on Biomedical Engineering 51 (6) (2004) 993–1002.

[3]

K. K. Ang, Z. Y. Chin, H. Zhang, C. Guan, Filter bank common spatial pattern (fbcsp) in brain-computer interface, in: Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, IEEE, 2008, pp. 2390–2397.

[4]

Z. J. Koles, M. S. Lazar, S. Z. Zhou, Spatial patterns underlying population differences in the background eeg, Brain topography 2 (4) (1990) 275–284.

[5]

G. Dornhege, B. Blankertz, M. Krauledat, F. Losch, G. Curio, K.-R. Müller, Optimizing spatio-temporal filters for improving brain-computer interfacing, in: Advances in Neural Information Processing Systems, 2006, pp. 315–322.

[6]

G. Pfurtscheller, C. Guger, H. Ramoser, Eeg-based brain-computer interface using subject-specific spatial filters, Engineering Applications of Bio-Inspired Artificial Neural Networks (1999) 248–254.

[7]

K. Fukunaga, Introduction to statistical pattern recognition, Academic press, 2013.

[8]

B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, K.-R. Muller, Optimizing spatial filters for robust eeg single-trial analysis, IEEE Signal processing magazine 25 (1) (2008) 41–56.

[9]

P. K. Agarwal, S. Har-Peled, K. R. Varadarajan, Approximating extent measures of points, Journal of the ACM 51 (4) (2004) 606–635.

[10]

Z. J. Koles, The quantitative extraction and topographic mapping of the abnormal components in the clinical eeg, Electroencephalography and clinical Neurophysiology 79 (6) (1991) 440–447.

[11]

P. Legendre, M. J. Fortin, Spatial pattern and ecological analysis, Vegetatio 80 (2) (1989) 107–138.

[12]

Q. Zhao, L. Zhang, A. Cichocki, J. Li, Incremental common spatial pattern algorithm for bci, in: Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, IEEE, 2008, pp. 2656–2659.

[13]

D. A. Ross, J. Lim, R.-S. Lin, M.-H. Yang, Incremental learning for robust visual tracking, International journal of computer vision 77 (1-3) (2008) 125–141.

[14]

A. Levey, M. Lindenbaum, Sequential karhunen-loeve basis extraction and its application to images, IEEE Transactions on Image processing 9 (8) (2000) 1371–1374.

[15]

P. K. Agarwal, C. M. Procopiuc, Approximation algorithms for projective clustering, in: Proc. 11th Annu. ACM-SIAM Symp. on Discrete Algorithms (SODA), 2000, pp. 538–547.

[16]

P. K. Agarwal, C. M. Procopiuc, K. R. Varadarajan, Approximation algorithms for k-line center., in: Proc. 10th Annu. European Symp. on Algorithms (ESA), Vol. 2461 of Lecture Notes in Computer Science, Springer, 2002, pp. 54–63.

[17]

S. Har-Peled, Clustering motion, Discrete Comput. Geom. 31 (4) (2004) 545–565.

URL http://dx.doi.org/10.1007/s00454-004-2822-7

[18]

D. Feldman, M. Monemizadeh, C. Sohler, A PTAS for k-means clustering based on weak coresets, in: Proc. 23rd ACM Symp. on Computational Geometry (SoCG), 2007.

[19]

P. K. Agarwal, S. Har-Peled, K. R. Varadarajan, Geometric approximations via coresets, Combinatorial and Computational Geometry - MSRI Publications 52 (2005) 1–30.

[20]

A. Czumaj, C. Sohler, Sublinear-time approximation algorithms for clustering via random sampling, Random Struct. Algorithms (RSA) 30 (1-2) (2007) 226–256.

URL http://dx.doi.org/10.1002/rsa.20157

[21]

J. M. Phillips, Coresets and sketches, near-final version of chapter 49 in handbook on discrete and computational geometry, 3rd edition, CoRR abs/1601.00617.

URL http://arxiv.org/abs/1601.00617

[22]

G. Frahling, C. Sohler, Coresets in dynamic geometric data streams, in: Proc. 37th Annu. ACM Symp. on Theory of Computing (STOC), 2005, pp. 209–217.

[23]

A. Czumaj, F. Ergün, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, C. Sohler, Approximating the weight of the euclidean minimum spanning tree in sublinear time, SIAM Journal on Computing 35 (1) (2005) 91–109.

[24]

G. Frahling, P. Indyk, C. Sohler, Sampling in dynamic data streams and applications, Int. J. Comput. Geometry Appl. 18 (1/2) (2008) 3–28.

URL http://dx.doi.org/10.1142/S0218195908002520

[25]

L. S. Buriol, G. Frahling, S. Leonardi, C. Sohler, Estimating clustering indexes in data streams, in: Proc. 15th Annu. European Symp. on Algorithms (ESA), Vol. 4698 of Lecture Notes in Computer Science, Springer, 2007, pp. 618–632.

URL http://dx.doi.org/10.1007/978-3-540-75520-3_55

[26]

P. Indyk, S. Mahabadi, M. Mahdian, V. S. Mirrokni, Composable core-sets for diversity and coverage maximization, in: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, ACM, 2014, pp. 100–108.

[27]

V. Mirrokni, M. Zadimoghaddam, Randomized composable core-sets for distributed submodular maximization, in: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, ACM, 2015, pp. 153–162.

[28]

S. Aghamolaei, M. Farhadi, H. Zarrabi-Zadeh, Diversity maximization via composable coresets, in: Proceedings of the 27th Canadian Conference on Computational Geometry, 2015.

[29]

D. Feldman, M. Schmidt, C. Sohler, Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering, in: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2013, pp. 1434–1453.

[30]

A. Deshpande, L. Rademacher, S. Vempala, G. Wang, Matrix approximation and projective clustering via volume sampling, in: Proc. 17th Annu. ACM-SIAM Symp. on Discrete Algorithms (SODA), 2006, pp. 1117–1126.

[31]

S. Har-Peled, K. R. Varadarajan, Projective clustering in high dimensions using coresets., in: Proc. 18th ACM Symp. on Computational Geometry (SoCG), 2002, pp. 312–318.

[32]

P. K. Agarwal, M. Jones, T. M. Murali, C. M. Procopiuc, A Monte Carlo algorithm for fast projective clustering, in: Proc. ACM-SIGMOD Int. Conf. on Management of Data, 2002, pp. 418–427.

URL http://doi.acm.org/10.1145/564691.564739

[33]

M. Bouguessa, S. Wang, Q. Jiang, A K-means-based algorithm for projective clustering, in: Int. Conf. on Pattern Recognition, 2006, pp. 888–891.

URL http://doi.ieeecomputersociety.org/10.1109/ICPR.2006.88

[34]

P. K. Agarwal, N. H. Mustafa, $k$ -means projective clustering, in: Proc. 23rd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS), 2004, pp. 155–165.

URL http://doi.acm.org/10.1145/1055558.1055581;http://www.acm.org/sigmod/pods/proc04/pdf/P-16.pdf

[35]

Deshpande, Rademacher, Vempala, Wang, Matrix approximation and projective clustering via volume sampling, in: SODA: ACM-SIAM Symposium on Discrete Algorithms (A Conference on Theoretical and Experimental Analysis of Discrete Algorithms), 2006.

[36]

K. Varadarajan, X. Xiao, On the sensitivity of shape fitting problems, arXiv preprint arXiv:1209.4893.

[37]

D. Feldman, M. Volkov, D. Rus, Dimensionality reduction of massive sparse datasets using coresets, in: Advances in neural information processing systems (NIPS), 2016.

[38]

D. Feldman, M. Faulkner, A. Krause, Scalable training of mixture models via coresets, in: Advances in neural information processing systems (NIPS), 2011, pp. 2142–2150.

[39]

I. W. Tsang, J. T. Kwok, P.-M. Cheung, Core vector machines: Fast svm training on very large data sets, Journal of Machine Learning Research 6 (Apr) (2005) 363–392.

[40]

M. Lucic, O. Bachem, A. Krause, Strong coresets for hard and soft bregman clustering with applications to exponential family mixtures, in: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016, pp. 1–9.

[41]

O. Bachem, M. Lucic, S. H. Hassani, A. Krause, Approximate k-means++ in sublinear time, in: Conference on Artificial Intelligence (AAAI), 2016.

[42]

M. Lucic, M. I. Ohannessian, A. Karbasi, A. Krause, Tradeoffs for space, time, data and risk in unsupervised learning., in: AISTATS, 2015.

[43]

O. Bachem, M. Lucic, A. Krause, Coresets for nonparametric estimation—the case of dp-means, in: International Conference on Machine Learning (ICML), 2015.

[44]

J. H. Huggins, T. Campbell, T. Broderick, Coresets for scalable bayesian logistic regression, arXiv preprint arXiv:1605.06423.

[45]

G. Rosman, M. Volkov, D. Feldman, J. W. Fisher III, D. Rus, Coresets for k-segmentation of streaming data, in: Advances in Neural Information Processing Systems (NIPS), 2014, pp. 559–567.

[46]

S. J. Reddi, B. Póczos, A. Smola, Communication efficient coresets for empirical loss minimization, in: Conference on Uncertainty in Artificial Intelligence (UAI), 2015.

[47]

C. Sung, D. Feldman, D. Rus, Trajectory clustering for motion prediction, in: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2012, pp. 1547–1552.

[48]

D. Feldman, A. Sugaya, C. Sung, D. Rus, idiary: from gps signals to a text-searchable diary, in: Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, ACM, 2013, p. 6.

[49]

D. Feldman, , C. Xian, D. Rus, Private coresets for high-dimensional spaces, submitted, Tech. rep. (2016).

[50]

M. Feigin, D. Feldman, N. Sochen, From high definition image to low space optimization, in: International Conference on Scale Space and Variational Methods in Computer Vision, Springer, 2011, pp. 459–470.

[51]

D. Feldman, M. Feigin, N. Sochen, Learning big (image) data via coresets for dictionaries, Journal of mathematical imaging and vision 46 (3) (2013) 276–291.

[52]

G. Alexandroni, G. Z. Moreno, N. Sochen, H. Greenspan, Coresets versus clustering: comparison of methods for redundancy reduction in very large white matter fiber sets, in: SPIE Medical Imaging, International Society for Optics and Photonics, 2016, pp. 97840A–97840A.

[53]

D. Feldman, T. Tassa, More constraints, smaller coresets: constrained matrix approximation of sparse big data, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15), ACM, 2015, pp. 249–258.

[54]

E. Liberty, Simple and deterministic matrix sketching, in: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2013, pp. 581–588.

[55]

K. L. Clarkson, Coresets, sparse greedy approximation, and the frank-wolfe algorithm, ACM Transactions on Algorithms (TALG) 6 (4) (2010) 63.

[56]

M. B. Cohen, Y. T. Lee, G. Miller, J. Pachocki, A. Sidford, Geometric median in nearly linear time, in: Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, ACM, 2016, pp. 9–21.

[57]

P. Drineas, R. Kannan, M. W. Mahoney, Fast monte carlo algorithms for matrices i: Approximating matrix multiplication, SIAM Journal on Computing 36 (1) (2006) 132–157.

[58]

P. Drineas, M. W. Mahoney, S. Muthukrishnan, Sampling algorithms for $l_{2}$ regression and applications, in: SODA, 2006.

[59]

A. Dasgupta, P. Drineas, B. Harb, R. Kumar, M. W. Mahoney, Sampling algorithms and coresets for $\ell_{p}$ -regression, in: Proc. 19th Annu. ACM-SIAM Symp. on Discrete Algorithms (SODA), 2008, pp. 932–941.

URL http://doi.acm.org/10.1145/1347082.1347184

[60]

A. Munteanu, C. Sohler, D. Feldman, Smallest enclosing ball for probabilistic data, in: Proceedings of the thirtieth annual symposium on Computational geometry, ACM, 2014, p. 214.

[61]

L. Huang, J. Li, J. M. Phillips, H. Wang, epsilon-kernel coresets for stochastic points, in: P. Sankowski, C. D. Zaroliagis (Eds.), 24th Annual European Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark, Vol. 57 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016, pp. 50:1–50:18.

doi:10.4230/LIPIcs.ESA.2016.50.

URL http://dx.doi.org/10.4230/LIPIcs.ESA.2016.50

[62]

S. Har-Peled, S. Mazumdar, On coresets for $k$ -means and $k$ -median clustering, in: STOC, 2004.

[63]

D. Feldman, M. Langberg, A unified framework for approximating and clustering data., in: Proc. 34th Annu. ACM Symp. on Theory of Computing (STOC), 2011, see http://arxiv.org/abs/1106.1379 for fuller version.

[64]

G. Pfurtscheller, C. Neuper, C. Guger, W. Harkam, H. Ramoser, A. Schlogl, B. Obermaier, M. Pregenzer, Current trends in graz brain-computer interface (bci) research, IEEE Transactions on Rehabilitation Engineering 8 (2) (2000) 216–219.

[65]

R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of human genetics 7 (2) (1936) 179–188.

[66]

A. M. Martínez, A. C. Kak, Pca versus lda, IEEE transactions on pattern analysis and machine intelligence 23 (2) (2001) 228–233.

[67]

A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, H. E. Stanley, Physiobank, physiotoolkit, and physionet, Circulation 101 (23) (2000) e215–e220.

arXiv:http://circ.ahajournals.org/content/101/23/e215.full.pdf, doi:10.1161/01.CIR.101.23.e215.

URL http://circ.ahajournals.org/content/101/23/e215

[68]

G. Schalk, D. J. McFarland, T. Hinterberger, N. Birbaumer, J. R. Wolpaw, Bci2000: a general-purpose brain-computer interface (bci) system, IEEE Transactions on biomedical engineering 51 (6) (2004) 1034–1043.

[69]

L. C. Cruikshank, A. Singhal, M. Hueppelsheuser, J. B. Caplan, Theta oscillations reflect a putative neural mechanism for human sensorimotor integration, Journal of Neurophysiology 107 (1) (2012) 65–77.

[70]

(will be published upon acceptance of this paper), Implementation of coreset for sum of vectors, Tech. rep. (2015).

Bibliography70

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Ramoser, J. Muller-Gerking, G. Pfurtscheller, Optimal spatial filtering of single trial eeg during imagined hand movement, IEEE transactions on rehabilitation engineering 8 (4) (2000) 441–446.
2[2] G. Dornhege, B. Blankertz, G. Curio, K.-R. Muller, Boosting bit rates in noninvasive eeg single-trial classifications by feature combination and multiclass paradigms, IEEE Transactions on Biomedical Engineering 51 (6) (2004) 993–1002.
3[3] K. K. Ang, Z. Y. Chin, H. Zhang, C. Guan, Filter bank common spatial pattern (fbcsp) in brain-computer interface, in: Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, IEEE, 2008, pp. 2390–2397.
4[4] Z. J. Koles, M. S. Lazar, S. Z. Zhou, Spatial patterns underlying population differences in the background eeg, Brain topography 2 (4) (1990) 275–284.
5[5] G. Dornhege, B. Blankertz, M. Krauledat, F. Losch, G. Curio, K.-R. Müller, Optimizing spatio-temporal filters for improving brain-computer interfacing, in: Advances in Neural Information Processing Systems, 2006, pp. 315–322.
6[6] G. Pfurtscheller, C. Guger, H. Ramoser, Eeg-based brain-computer interface using subject-specific spatial filters, Engineering Applications of Bio-Inspired Artificial Neural Networks (1999) 248–254.
7[7] K. Fukunaga, Introduction to statistical pattern recognition, Academic press, 2013.
8[8] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, K.-R. Muller, Optimizing spatial filters for robust eeg single-trial analysis, IEEE Signal processing magazine 25 (1) (2008) 41–56.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Real-Time EEG Classification via Coresets for BCI Applications

Abstract

keywords:

1 Introduction

1.1 Background

2 Related work: Coresets for EEG real-time processing

Running time

Streaming data

Distributed data

Dynamic computations

Real Time Training as a feedback

3 Proposed Algorithm

Lemma 1**.**

3.1 Common Spatial Patterns and Corsets

3.2 Classification Scheme

4 Results

5 Conclusions

References

Lemma 1.