TL;DR
This paper introduces a multi-task GAN framework that converts EEG signals evoked by geometrical shapes into accurate, detailed shape reconstructions, addressing low realism issues in EEG-based shape synthesis.
Contribution
It proposes a novel multi-task GAN with a CNN encoder and semantic alignment constraint for improved EEG-to-shape reconstruction, outperforming existing methods.
Findings
Outperforms state-of-the-art baselines in shape quality
Enhances shape realism with semantic alignment
Effective latent representation learning from EEG signals
Abstract
Synthesizing geometrical shapes from human brain activities is an interesting and meaningful but very challenging topic. Recently, the advancements of deep generative models like Generative Adversarial Networks (GANs) have supported the object generation from neurological signals. However, the Electroencephalograph (EEG)-based shape generation still suffer from the low realism problem. In particular, the generated geometrical shapes lack clear edges and fail to contain necessary details. In light of this, we propose a novel multi-task generative adversarial network to convert the individual's EEG signals evoked by geometrical shapes to the original geometry. First, we adopt a Convolutional Neural Network (CNN) to learn highly informative latent representation for the raw EEG signals, which is vital for the subsequent shape reconstruction. Next, we build the discriminator based on…
| Models | GAN | C-GAN [4] | ACGAN | Ours |
|---|---|---|---|---|
| Inception Score | 1.931 | 1.986 | 2.061 | 2.178 |
| Inception Accuracy | 0.43 | 0.67 | 0.79 | 0.83 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: ‡ University of New South Wales, Sydney, Australia
§ Xi’an Jiaotong University, Xi’an, China
{xiang.zhang3,manqing.dong,chang.ge}@student.unsw.edu.au,
{xiaocong.chen,lina.yao}@unsw.edu.au
Multi-task Generative Adversarial Learning on Geometrical Shape Reconstruction from EEG Brain Signals
Xiang Zhang*‡*
Xiaocong Chen*‡*
Manqing Dong*‡*
Huan Liu§
Chang Ge*‡*
Lina Yao*‡*
Abstract
Synthesizing geometrical shapes from human brain activities is an interesting and meaningful but very challenging topic. Recently, the advancements of deep generative models like Generative Adversarial Networks (GANs) have supported the object generation from neurological signals. However, the Electroencephalograph (EEG)-based shape generation still suffer from the low realism problem. In particular, the generated geometrical shapes lack clear edges and fail to contain necessary details. In light of this, we propose a novel multi-task generative adversarial network to convert the individual’s EEG signals evoked by geometrical shapes to the original geometry. First, we adopt a Convolutional Neural Network (CNN) to learn highly informative latent representation for the raw EEG signals, which is vital for the subsequent shape reconstruction. Next, we build the discriminator based on multi-task learning to distinguish and classify fake samples simultaneously, where the mutual promotion between different tasks improves the quality of the recovered shapes. Then, we propose a semantic alignment constraint in order to force the synthesized samples to approach the real ones in pixel-level, thus producing more compelling shapes. The proposed approach is evaluated over a local dataset and the results show that our model outperforms the competitive state-of-the-art baselines.
Keywords:
EEG; geometrical shape reconstruction; generative adversarial networks
1 Introduction
Since the advent of neuroscience and brain-computer interface (BCI), numerous studies tried to recover the visual stimuli based on the informative human brain activities [11, 12]. The development of the decoding technologies of chaotic brain signals is supposed to reveal the mechanism of brain neurons and may implement some fantastic ambitions such as mind reading [zhang2018converting]. Most of the existing work focused on functional magnetic resonance imaging (fMRI) monitoring brain activities by detecting changes associated with blood flow in brain areas. However, fMRI-based image reconstruction faces several major challenges [7, 11]. The temporal resolution of fMRI is low constrained by the blood flow speed; the acquisition of fMRI requires a scanner which is expensive and hard to afford; the scanner is heavy and has poor portability [12].
Thus, Electroencephalogram (EEG) recently has drawn much attention as its high temporal resolution, low price, and high portability. EEG is a non-invasive signal measuring the voltage fluctuations generated by an electrical current within human neurons. Researchers have tried to exploit EEG signals to reconstruct visual stimuli [4, 9] through Generative Adversarial Networks (GANs). Nevertheless, the previous studies suffer from the low realism problem of the generated samples, which means that the model can not generate images with high realism based on the input brain signals. In other words, the current EEG-based synthesis methods can roughly present the visual stimuli but fail to contain necessary details. For example, as shown in Figure 1, the clear geometric shapes are present to the individual and reconstruct the shapes based on the collected EEG data. It is demonstrated that the geometric shapes generated by traditional GAN and CGAN are blurry and lack of realistic details.
Aiming at the aforementioned issues, in this paper, we conduct experiments to measure the individual’s EEG oscillation evoked by various geometrical shapes and propose a novel framework in order to precisely decode the EEG signals and synthesize the geometric shapes. Moreover, we employ a Convolutional Neural Network (CNNs) to explore the latent representation form the raw EEG signals since CNN is much efficient than the Recurrent Neural Networks (RNNs) with a similar EEG representation learning ability based on our empirical experiments. In addition, we adopted a multi-task discriminator with a task-specific classifier which assigns the geometric shape into the correct class for the aim of improving the quality of the recovered shapes. Furthermore, we propose a semantic alignment method involving the semantic information of the real shape to enhance the realism level of the reconstructed shape. The previous works are mainly paid attention to brain signal based images (e.g., bird and plane) reconstruction which contain too many attributes (e.g., color, shape, size, background, and semantic information), as a result, it is difficult to figure out which attribute the human brain is more sensitive to and which one contributes more to the object reconstruction. Thus, in this work, we focus on the EEG-based geometric shape reconstruction and attempt to illustrate that EEG signals are sensitive to geometries.
In detail, the contributions of this work are listed here:
- •
We present a novel deep generative model to recover the geometrical shape seen by human beings from the EEG signals. To our best knowledge, we are the first work investigating the brain signal based geometric shape reconstruction. The reproducible codes are publicly available here111https://github.com/xiangzhang1015/EEG__Shape_Reconstruction.
- •
We propose an effective semantic alignment method to harness the semantic information of the original geometric shape in order to force the approach to produce more realistic shapes.
- •
We conducted a local EEG dataset stimulated by various geometric shapes and evaluate the proposed approach over the collected dataset. The experimental results demonstrated that our model outperforms all the competitive state-of-the-art baselines.
2 Related Work
Recent years’ research in neuroscience and neuroimaging [3] indicated that human perception of visual stimuli can be decoded through some techniques in neuroimaging. To be specific, a few works gave evidence about decoding the brain signals to human activity by using the Functional Magnetic Resonance Imaging (fMRI) and EEG. There are some works use the fMRI signals to reconstruct the image which is seen by the individual and get an acceptable performance [7, 6]. The studies show the potential of fMRI-based image reconstruction in the brain signals decoding area, however, fMRI faces a number of crucial issues such as expensive acquisition equipment and low portability. Apart from the fMRI based method, there are a few EEG based methods in image reconstruction as EEG signals are less expensive [4, 9]. As a typical investigation, Brain2image [4] encoded the raw EEG signals into a latent space which contains the distinctive information, and then sent them to a Conditional Generative Adversarial Networks (CGAN) for image reconstruction. Palazzo et al. [9] applied a very similar algorithm framework.
Most of the visual object reconstruction methods are based on Generative Adversarial Networks (GANs) and the variations. GANs [2], as the typical deep learning frameworks, was used widely in image generation. The standard GANs are composed of a generator network which generates images from the random sampled noise and a discriminator network which tried to distinguish the generated image correctly. Normally, original GANs had to suffer from the uncontrollable issue of the generation process. In order to retard it, the conditional GAN (CGAN) was proposed [5] which involves the conditional information (e.g., labels) in order to control the generating process. Auxiliary Classifier GAN (ACGAN)[8] improve the performance of GAN for image synthesis. ACGAN demonstrated that adding more structure to the GAN latent space along with a specialized cost function results in higher quality samples. A task-specific branch in the discriminator is empowered to enhance the discriminability.
Summary. Most brain signal based image reconstruction work is based on fMRI. Due to the drawbacks of fMRI (e.g., low time resolution, expensive, and low portability), we focus on EEG based geometric shape reconstruction. Compare to the typical EEG-based work like brain2image [4], we have several technical advantages: 1) we concentrate on the influence to the EEG signals brought by geometric attribute while [4] focus on images with a large number of attributes; 2) we adopt CNN instead of RNN to learn the latent EEG features which cost less training time with a similar accuracy; 3) we add an auxiliary task-specific classifier to improve the discriminability of the discriminator; 4) we propose a semantic alignment method to generate more realistic images.
3 Method
In this study, we aim to propose a method to convert the individual’s mental geometry into physical shape. In particular, we first decode the non-invasive EEG signals into an implicit representation (Section 3.1) and then propose a modified GAN framework to generate the real shape which evoked the EEG signals (Section 3.2. In this section, we will introduce the workflow of the whole system in detail.
3.1 EEG Feature Learning
In the EEG feature learning, we adopt a CNN structure to capture the latent distinguishable features from the collected EEG signals. Some research had demonstrated that CNN is empowered to learn informative features from noisy EEG data[13, 1]. Suppose the EEG sample pairs can be denoted by where and represent the EEG observations and the corresponding one-hot label. In this paper, we focused on the decoding of five different visual-stimuli evoked imagination, thus the number of labels is five. The denotes the number of EEG segments and denotes the time- and spatial- resolution of each segment.
Figure 2 shows the workflow of the learning procedure of the discriminative representation. The visual-stimuli evoked EEG signals, reflecting the imagination in the user’s mind, are feed into a CNN model with seven layers. The first convolutional layer contains 32 filters with the kernel size of and stride of . The padding method is ‘SAME’ while the activation function is ReLU. The first pooling layer adopts max pooling and both the pooling size and strides are . The second convolutional and pooling layers are identical to the first layers, respectively, except the Conv 2 has 64 filters. The followed fully-connected layer has nodes, which is regarded as the learned representation, denoted by , and contains enough information to reconstruct the visual shape. The learning algorithm iterates for 1,000 epochs with Adam optimizer has a learning rate of .
Compared to Brain2Image [4] which employed LSTM for feature learning, CNN is able to achieve a similar performance but spend much less training time. In particular, LSTM obtained the classification accuracy of 74% with 5,935s while CNN achieved 72% but with only s.
3.2 Multi-task Generation Model
3.2.1 Overview
In this part, we will describe the framework which is used to reconstruct the shapes that human seeing. As shown in Figure 3, the proposed geometrical shape generation framework contains two components: a generator and a discriminator.
The generator receives the learned discriminative EEG representation along with a random sampled Gaussian noise and produces generated shape. The EEG representation is evolved to guarantee the compelling of the generated shapes while the Gaussian noise is adopted to keep the diversity. On the other hand, the discriminator receives the real shape which evoked the brain signals (the imagination which presented in the human brain) and the generated fake shape. Inspired by ACGAN [8], we design a multi-task discriminator containing two branches while the first branch, like the standard GAN, aims at the recognition of the fake shapes and the second branch, an auxiliary task-specific classifier, attempts to classify what class the shape belongs to. The first branch is called real/fake classifier whilst the second one is called task-specific classifier. By adding the task-specific classifier, the designed discriminator not only is able to distinguish whether the shape is real or not but also can recognize the category of the shape. As a consequence, the discriminator drives the distribution of the synthesized shapes not only approximate to the general distribution of the overall real shapes but also approximate to the distribution of a specific category. In addition, the learned EEG representation is also input to the discriminator, as proposed in [5], in order to make the discriminator under the same conditional situation with the generator.
3.2.2 Architecture
Next we report the details of the architecture. The generator receives the input vector which concatenates and , represented by , and attempts to map it to a meaningful shape. The generator is composed of a fully-connected and two deconvolutional layers each followed by a unsampling layer. The is first fed into the fully-connected layer with nodes:
[TABLE]
where and denote the weight, bias vector, and the sigmoid function, respectively. Then is reshaped into where 64 denotes the depth. To this end, has a similar form, but deeper depth, with the raw EEG segment which is supposed to contain enough information to reconstruct the user’s imagination. Afterward, is sent to the the first deconvolutional layer with 32 filters, kernel size , stride , and ’SAME’ padding method. The upsampling operation is the invert operation of pooling and shares the same parameters with pooling layer. The second deconvolutional with one filter and upsampling layers. We choose the tanh as activation function since it transforms the signals into the range which is the same range the real shape falls into. The synthesized shape has shape . According to empirical experiments, we set the shape size 4 times of the EEG raw segment in both width and height in order to have a better generation quality. The real geometric shape is in greyscale with format . All the pixels are normalized into the range by max-min normalization and then transformed to by:
[TABLE]
In the discriminator, as shown in Figure 3, both and are fed into the discriminator which has almost the same structure and hyper-parameters with the discriminative representation learning model (Section 3.1). The input shape is flattened to a vector and then concatenates with the learned representation . The fully-connected layer has 100 nodes. This designed discriminator has two branches corresponding two output layers. The output layer of the real/fake classifier only has one node which represents the fake probability. As for the task-specific classifier, the output layer has five nodes corresponding to five different geometrical shape categories.
3.2.3 Loss Function
We present the loss functions in the proposed framework. For the generator, since we add a task-specific classifier, the loss function contains two components where one component forces the discriminator cannot recognize the shape is generated while another component forces the discriminator to recognize which shape category the shape belongs to. Thus, the log-likelihood loss function for the generator can be defined as [8]:
[TABLE]
in which,
[TABLE]
describes the generator , and
[TABLE]
describes the real/fake classifier and task-specific classifier of the discriminator , respectively. As for the discriminator, the loss function also contains two components separately coming from the two classifiers. The discriminator is supposed to filter out which shape is generated, meanwhile, to assign the shape into the correct class. The log-likelihood loss function for the discriminator is:
[TABLE]
In the above formula, the represents the class label. The denote the predicted class and and source, which are the classification results of the multi-task generator. denotes the shape fed into the discriminator. The denotes the probability distribution over the source while the denotes the probability distribution over the class label .
3.3 Semantic Alignment
To this end, the geometrical shape reconstruction model is able to generate a batch of samples which have enough diversity but still less discriminability. Furthermore, in order to increase the discriminability of the generated samples and make the samples more realistic, we propose a semantic alignment method to adopt the semantic information to make the synthesized shape more realistic and sharper. In particular, we add an additional constraint on the generator loss function aiming at reducing the distance between the real and the generated geometric shapes.
The semantic distance can be measured by :
[TABLE]
where denotes the number of pixels in the geometric sample and . The and denote the pixels in the real and generated samples. In order to improve the performance of the generator, the is considered as a regularization of the generator loss. Thus, we update the Equation 3 as:
[TABLE]
where is a constant coefficient to adjust the weight of semantic regularization. If the alignment constraint too strong, the generated shapes may have less diversity. In this work, we set to make a trade-off between the diversity and discriminability of the generated samples.
During the training, both and are optimized by the Adam optimizer. The learning rate is set as 0.0002 with the exponential decay rate of 0.5. In each epoch, the and are separately trained in turn. The proposed framework converges after 120 epochs and trend to overfitting after 160 epochs, thus, we adopt the early stopping strategy by breaking the iteration at the 150-th epoch.
4 Experiments
In this section, we will describe the experiments and the performance analysis containing qualitative and quantitative aspects in detail. The qualitative comparison will conduct the analysis in the quality of the generated shapes, and the quantitative comparison will be based on the inception score [10] and inception accuracy.
4.1 EEG Signal Acquisition
We conducted a local experiment with 8 healthy participants (6 males and 2 females) aged 25 3, which is approved by UNSW ethic abroad (HC190315). During the experiments, the participant is required to sit in an armed comfortable chair in front of a computer monitor. We select five representative and widely-seen geometrical shapes (circle, star, triangle, rhombus, and rectangle) to present to the subject. The whole experiments contain two sessions and each session has five trials. In each trial, the five geometrical shapes are presented in random order and each shape lasts for five seconds. There are five seconds relax period among two adjacent shapes. The relaxing time among trials and sessions are 10s and 30s, respectively. The EEG signals are collected through a portable Emotiv EPOC+ headset with 14 electrodes and the sampling frequency is set as 128 Hz. Each EEG segment contains ten continuous instances with 50% overlapping. The dataset is randomly divided into a training set (80% proportion) and testing set (20% proportion).
Based on the collected EEG data, we report the hyper-parameters settings. The single EEG segment ( and ) is compressed into a latent discriminative representation with dimension . In the generator, the stochastically sampled noise has dimension . The coefficient of semantic regularization is set as 0.001.
4.2 Qualitative Comparison
In this section, we compare the quality of the generated shapes among the proposed method and the state-of-the-art models. As shown in Figure 4, we choose the most widely used generative models including GAN, CGAN and ACGAN as the baseline.
GAN achieve a promising result in many areas, especially in shape field [2]. On the top of basic GAN, CGAN [5] is proposed to add the conditional information as a constraint, which is adopted in [4]. Furthermore, ACGAN attempt to deeply exploit the informative sample labels to enhance the discriminability of [8]. Our work, compared to ACGAN, proposed a semantic alignment method to constrain the distance among the synthesized shapes and the visual geometrical shapes in order to further emphasize the reality.
It’s easy to find that, from Figure 4, our approach have the best shape quality. To be specific, the samples which generated by GAN are lack of clear edge, which is a typical mode collapse problem, meanwhile, it’s not hard to figure out that most of the synthesized shapes have miscellaneous features. The CGAN has a better performance than normal GAN as the shapes have a higher integrity. However, we still can find that some shapes generated by CGAN have combined features such as a star have the feature from rhombus. The ACGAN have the best result among the baseline models, which it can learn most of the shapes’ feature and correctly reconstruct the shapes with a trivial acceptable flaw. Our model can reconstruct all the shapes correctly which have the highest similarity with the ground truth.
4.3 Quantitative Comparison
The qualitative comparison is relatively easy as the shape quality is the assessment criteria. The quantitative analyses are hard to conduct as the comparison between reconstructed and real shape is not obvious and clearly defined. The common way we used to do that is using the inception score and the inception accuracy [4]. We build an inception network used the generated shapes as input in order to calculate the inception score which measures how realistic the generated shapes are. In detail, we generate 1,000 images for each geometric shape and calculate the overall inception score. Moreover, our work is supposed to convert the specific EEG signals into the corresponding geometrical shape belonging to the specific label. Thus, we adopt the performance of the task-specific classifier when the input data is as inception accuracy in order to measure how precise can our model generates shapes.
We conduct the quantitative analyses for the baselines and our proposed model. The results are presented in Table 1, in which, it is easy to observe that our model achieves the highest inception score and inception accuracy of 2.178 and 0.83, respectively. The inception score is not good as the public datasets like CIFAR-10 and the most possible reason is that our generated shapes are conditioned by EEG signals which is chaotic and has a low signal-to-noise ratio. Even though, the proposed approach outperforms all the competitive baselines.
5 Discussion and Future Work
In this section, we discuss the opening challenges and potential future work of our research.
First of all, one major issue faced by brain signal based reconstruction is the recovery of unseen geometrical shapes. For instance, one future scope is to decode the EEG signals evoked by star while the star never is trained in the reconstruction model. One possible solution is train a common generative model by a large classes of basic geometrical shapes (e.g., circle, ellipse, straight line, triangle, rectangle, and rhombus) in order to learn the latent features of each different shape and then approximate the unseen shape (e.g., star) based on the learned features.
Second, we only focused on the simple geometrical shapes in this work, as a preliminary study, however, the real world application demands more complex shapes like a bow. One of our future works is to consider more complicated geometric shapes in the experiments. In addition, another potential research direction is to increase the number of geometrical categories since this work only evaluated five basic classes.
Last but not least, more participants should be involved in the experiments in order to provide a general generative model which is robust for different individuals. The influence of inter-subject divergence should be taken into account in future research.
6 Conclusion
In this paper, we propose a novel approach to reconstruct the geometrical shape based on the brain signals. We first develop a framework learning the latent discriminative representation of the raw EEG signals, and then, based on the learned representation, we propose an adversarial reconstruction framework to recover the geometric shapes which are visualizing by the human. In particular, we propose a semantic alignment method to improve the realism of the generated samples and force the framework to generate more realistic geometric shapes. The proposed approach is evaluated over a local dataset and the experiments show that our model outperforms the competitive state-of-the-art methods both quantitatively and qualitatively.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adeli, H.: Deep convolutional neural network for the automated detection and diagnosis of seizure using eeg signals. Computers in biology and medicine 100, 270–278 (2018)
- 2[2] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems. pp. 2672–2680 (2014)
- 3[3] Haynes, J.D., Rees, G.: Neuroimaging: decoding mental states from brain activity in humans. Nature Reviews Neuroscience 7(7), 523 (2006)
- 4[4] Kavasidis, I., Palazzo, S., Spampinato, C., Giordano, D., Shah, M.: Brain 2image: Converting brain signals into images. In: Proceedings of the 25th ACM international conference on Multimedia. pp. 1809–1817. ACM (2017)
- 5[5] Mirza, M., Osindero, S.: Conditional generative adversarial nets. ar Xiv preprint ar Xiv:1411.1784 (2014)
- 6[6] Naselaris, T., Prenger, R.J., Kay, K.N., Oliver, M., Gallant, J.L.: Bayesian reconstruction of natural images from human brain activity. Neuron 63(6), 902–915 (2009)
- 7[7] Nishimoto, S., Vu, A.T., Naselaris, T., Benjamini, Y., Yu, B., Gallant, J.L.: Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology 21(19), 1641–1646 (2011)
- 8[8] Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. pp. 2642–2651. JMLR. org (2017)
