Using Deep-Convolutional GANs to Improve SSVEP Classification Systems

An in-depth overview of DCGANs and it’s implications within the brain-computer interface space

Bagavan Marakathalingasivam
11 min readJun 1, 2022

This article summarizes the paper: “Simulating Brain Signals: Creating Synthetic EEG Data via Neural-Based Generative Models for Improved SSVEP Classification”- and is intended for those who already have a general understanding of AI/Deep Learning.

Enter the realm of Brain-Computer Interfaces

A brain-computer interface (BCI) is a device that can capture brain signals from a person, analyze them and translate them into a specific output or command. There are different methods to collect capturing our brain signals with BCIs, which are generally invasive/semi-invasive — you would need to undergo some sort of surgery to implant electrodes into your brain. However, one of the most common methods is a non-invasive method known as electroencephalography (EEG).


Electroencephalography (electro-enceph-alo-graphy), or EEG for short, is what’s used to collect the electrical activity from the brain by placing electrodes (small metal discs with thin wires) onto the surface of your head. When neurons fire in our brain, they create electrical impulses that can be picked up by these electrodes.

An example of what an EEG headset could look like | Source

These EEG headsets are a lot cheaper and more accessible to work with than other methods. However, one issue that comes up with EEG data is the low spatial resolution — the data collected is a lot less “clear” than other methods.

EEG Headsets have a ton of use cases which genearlly fall under the healthcare space for medical diagnosis (comas, seizure detection/prevention, etc.), aid in meditation, playing video games and a lot more.

The SSVEP Phenomenon

One really interesting thing about our brainwaves is that sometimes there can be a spike of electrical activity in response to specific stimulation of a sensor nerve pathway.

For example, we can use an EEG headset to detect how fast a screen you are looking at is refreshing from just your brainwaves. This is because there is a linear correlation between the frequency of your brainwaves and the screen you’re looking at (meaning if the refresh rate increases, the brainwave frequency will increase as well).

This paradigm is known as Steady-State Visually Evoked Potential (SSVEP).

The spike(s) in the graph represent the SSVEP signal | Source

Like the name suggests, SSVEP are signals that are created from a reponse to visual stimulation at a specific frequency/frequencies, which are geenrated by repeated flashes of certain frequencies that are presented to a person (like the refresh rate of a screen).

Now, this idea is cool and all but what can it actually be used for?

Being able to detect these responses can allow for a variety of applications, one of them being able to communicate with external devices.

One of the most popular applications of SSVEP classification relates to assisting people with ALS, or Amyotrophic Lateral Sclerosis. ALS is a neurological condition that occurs when your motor neurons (the neurons that control our muscles) start dying, resulting in people unable to talk, walk or even breathe.

Stephen Hawking, one of the most famous astrophysicists in the world, was affected by ALS. This disease made it next to impossible for him to communicate with others without the assistance of external devices.

Diagram showing how our SSVEP-based speller would work | Source

We can use BCIs to build a system that can generate words for someone whose affected by this problem. You see if we have a matrix of letters, each flickering at different frequencies. By leveraging the SSVEP paradigm, we can determine which letter a person is focusing on based-off of the frequency of the EEG data and output that letter onto the screen.

But there’s a problem…

When applying deep learning techniques for classifying SSVEP signals (and other applications) from EEG data, the accuracy tends to be low. The biggest reason for this shortcoming goes back to the data. Generally, EEG data collected and used to train deep learning models tend to come in fewer quantities, and the quality of the data is also a lot worse (broken channels, noise, etc.).

Example of what low-quality data might look like (many bad channels and necessary noise) | Source

However, collecting high-quality EEG data in large quantities has been very difficult for various reasons like the high reliance on careful per-subject calibration, fewer bad channels, etc. Which means that making these EEG experiments to collect data is not only expensive and time-consuming but extremely difficult as well.

Now, instead of focusing on seeing how we can change these experiments to collect EEG data that is high quality, what if we looked at approaching the problem in a different way.

What if we could, somehow, get rid of this process entirely? What if we could create synthetic EEG data for our models to train on?

If we could do that, then there would be no issue with collecting EEG data and training our models, opening doors to a lot more applications with this technology. In this article I’ll be going over the Deep-Convolutional Generative Adversarial Network (DCGAN), and how it’s used to increase the accuracy of an SSVEP classifier.

Breaking Down Deep-Convolutional Generative Adversarial Networks

Understanding Vanilla GANs

Generative adversarial networks (or GANs) are a type of neural network that are capable of creating fake data that hasn’t existed before (A generative neural network). GANs consist of two neural networks: a generator and a discriminator.

The generator’s job is what takes in input data and try to understand various features from the data and tries to recreate them.

The discriminators job is to determine if the output from the generator is real or fake (by comparing it to features from the existing data), and then penalize the unrealistic samples.

Visual representation of how a GAN works | Source

These two networks are trained simultaneously and continue till they reach equillibreium, which is when the discrminator cannot determine whether the generator’s output is real or fake.

Now, the structure of a Vanilla GAN uses fully-connected layers (an artifical neural network architecture) which can perform really well. However, EEG data is known to be very sensitive because if you were to even change on value of the data, it would be able to impact our model’s accuracy. Which means we would need to make sure our generator is able to understand various features and patterns to create better synthetic data, which could be difficult when using just a feed-forward neural network.

So how are we going to get rid of this problem?

Introducing Deep-Convolutional GANs

Deep-Convolutional GANs (or DCGANs) are a type of GAN that leverages Convolutional Neural Networks (CCNs), a type of neural network, to extract more features and representations of the data while being computationally less expensive.

This is done through the use of convolutions, which is an operation that creates a signal (output) through combining two other signals, which in our case is an input and kernel tensor (the filter that slides through our data to extract features).

Convolutional Neural Networks, or CNNs for short, are neural networks specifically designed for analyzing visual imagery. These networks can recognize the different complex patterns within an image.

If you want to learn more about DCGANs (and want to implement one for yourself) then feel free to check out this article.

Now that we understand how our DCGAN works, let’s go into the rest of the process of training our model and generating synthetic data. In the next section, I’ll be going over the different steps that were performed when running the experiment.

Step 1: Collecting the Data

To generate fake data, we need to first have data for our DCGAN to train on. The experiment from the paper collects SSVEP data by doing the following:

Objects are detected in a video using an object detection algorithm. They then flicker through black and white boxes on top of the object by creating a display frequency modulation of 10, 12 and 15 Hz (used for detecting the SSVEP paradigm)

This experiment was performed to create two different datasets:

  1. Video-Stimuli Dataset: a dataset that includes 50 unique sample recordings for each of the 3 classes (objects) taken from 1 subject
  2. NAO Dataset: which has a total of 80 different samples for the 3 classes and was taken from 3 different subjects (S01, S02, S03)
Diagram of how the data is being collected | Source

Step 2: Synthetic Data Generation

After collecting the data, we now need to train our generative model, but before we get into that we need to first get a deep look at our model’s architecture and how we will generate the EEG data.

The Model Architecture

Our generator consists of 4 layers:

  • One fully-connected layer as input for our data
  • Three 1D transpose convolutional layers

The discriminator consists of two layers:

  • One 1D Convolutional layer
  • One fully-connected layer for our output

1D Convolutional Layers

Generally, we use CNNs when working with images; however, since EEG data is one-dimensional (time-series data) we would use 1D convolutional layers. This type of convolution focuses on finding features from a vector, meaning the kernel (a vector that’s used to extract features from data) can only move in one direction to create our output.

Example of single channel 1D convolution | Source

Now, a transposed convolution function is similar to convolutions, except they are focused on upsampling the input data for the generator. This means that the output of the convolution is a bigger size than the original input. This way our generator will be able to generate more realistic outputs based on the data.

These 1D transposed convolutional layers take input from the dense layer and recognize features to generate new ‘fake’ outputs as our generator.

Leaky ReLU Activation Function

Whenever a neural network is training, activation functions are used to determine the output for each node in a layer to determine if the neuron’s input is important for our prediction (i.e., deciding if the node should ‘activate’ or not).

One of the most common activation functions is known as the ReLU function (rectified linear unit). This activation function outputs the input of a neuron if it is positive, and if it isn’t the value is then a zero.

One problem when using this with GANs is that the function will end up taking the maximum between the input value and zero, meaning we will lose information from the inputs that are less than 0. To overcome this we use a similar function known as the Leaky ReLU activation function.

Leaky ReLU function

Leaky ReLU has a small slope for negative values, meaning the outputs are going to be more accurate for our network.

Running our DCGAN

This DCGAN was trained using data from the Video-Stimuli dataset. We’re only using one dataset here is for generalization, which will help us see if the data generated from this model could improve our classification model from different data.

After training the model, the model generated a completely new dataset. With this experiment specifically, they generated 500 unique samples, 3 seconds long for the three different SSVEP frequencies (10, 12 and 15 Hz)

Visualizing each EEG dataset (the WGAN and VAE were other methods used in the paper) | Source

Now, all that’s left is to train our classification model on this new data, so let’s talk about that.

Step 3: Classification Procedure

Since we already have our data, the first step for any deep learning task is preprocessing our data. Especially when working with EEG data, there tends to be a lot of noise. You see, EEGs are used to capture electrical signals from the scalp of our heads… but literally, everything we do creates electrical activity — from moving our eyes, our jaws, and even our heartbeats are all going to get captured by our BCI device. These are known as artifacts, and that’s exactly why we need to preprocess and filter our data so that we only capture our brainwaves.

Preprocessing the EEG Data

Comparison of Raw vs. Preprocessed EEG data | Source

We will be using two different filters to get rid of this noise: a bandpass, and a notch filter.

The bandpass filter allows for a certain range of frequencies to be highlighted, and the rest of the data attenuated. In our case, we use this bandpass filter for data between 9–60Hz to get rid of any heartbeats, jaw clenches and other artifacts.

There is also the notch filter, which is kind of like the opposite of a bandpass filter. It attenuates the power over a specific range of frequencies. We’re using this filter to eliminate ay powerline noise which occurs in the 50/60hz range.

Representation of the bandpass (left) and notch (right) filters | Source

Classification with the SCU-CNN

After preprocessing our data, we move on to training our model. The model that we’re using is based on the SSVEP Convolutional Unit (SCU) CNN, with the following architecture:

Our SCU CNN Architecture for classifying the EEG signals | Source

We’re using the SCU-CNN model as it's been proven to perform better than other models when classifying SSVEP EEG data.

After training the CNN with a combination of real and synthetic data, the study found that the accuracy of the model increased from 84 to 89 percent!

Now the accuracy is pretty good… but we want great

Using Pre-training to increase the accuracy

If we want to increase this accuracy even more, we can perform pre-training to generalize our model.

You see, when you regularly train a neural network the model continues to change its weights and biases (values that are applied to the input of their node to change the output for the next layer) until the loss becomes close to zero. With pre-training, what we’re doing is first training our model with just the synthetic data until there is a low loss, and then training this new model (with the same weights and biases) on our real data. While we’re doing this, we are also fine-tuning the model (changing different hyperparameters like the batch size, loss functions, etc.) so that the accuracy increases as well.

After doing that, here were the results of our classifier:

Accuracy for test classification using synthetic data for pre-training stage. The baseline contains no pre-training. | Source

We can see that the accuracy for our pre-training stage increases significantly (especially for S01 data) proving that higher quality data can greatly impact a model’s performance.

Outline of the overall pipeline | Source


Now, the paper does go over two other types of generative models, VAEs and WGANs, but for the sake of this article, I only went into DCGANs since it’s the most common of the three. However, if you do want to learn more about those two then stay tuned because I’ll be putting out another topic related to those generative models soon. Although we saw promising results, this is still the first step to making brain-computer interfaces more accessible to people who are affected by various problems like ALS.

I can’t believe we’re already at the end of this article! I hope learned and built something new! If you have any questions, feel free to reach out to me on LinkedIn or Twitter :)