How to cross-validate better

  • NB: Document for OpenViBE 1.0.0 and later (updated 13.Mar.2017)


This tutorial explains how to do ‘crossvalidation over trials’ with OpenViBE. The idea is to repeatedly divide the set of trials in a BCI timeline to two non-overlapping sets, one used for training and the other for testing. The tutorial first describes potential challenges with cross-validation if done inside OpenViBE scenarios. Some possible problems are highlighted, and an external wrapper solution is proposed how to get better estimates with your existing scenarios. This tutorial supplies the wrapper scripts for this purpose.

Unfortunately using the provided scripts is not straightforward and requires you to modify the scenarios you use, but the procedure can in principle be customized for many different settings. This approach could work as a baseline if you need to get publication-quality crossvalidation results.

The scripts of this tutorial are provided as-is. For your personal use case, you will need to understand what they do and modify them and the related scenarios that you want to cross-validate. We assume that the reader is aware how cross-validation normally works. This is described in standard machine learning and statistical modelling textbooks.

Background: Some possible problems with cross-validation of streamed data

Sometimes users notice that the accuracy predicted by the OpenViBE classifier trainer box appears to exceed the accuracy which is eventually perceived in the online scenario. Although the ‘excitement’ and feedback in the online scenario can change the user performance (e.g. in motor imagery), there are also situations where the reported cross-validation accuracies really are too optimistic.

It should be understood that the cross-validation in the classifier trainer box works correctly in the usual machine learning sense: you give it IID (independent and identically distributed) feature vectors, and you get a correct crossvalidation estimate of the classifier performance. However, this box cannot be responsible for what happens outside it. In practice, some OpenViBE scenarios create data in a manner that the IID assumption clearly does not hold. Some examples follow.

  • The cross-validation in the box cannot take into account any other, possibly preceding supervised training stages; For example, CSP or XDAWN can make an overfitting transform of the data and the labels before the classifier training stage takes place. The final evaluating classifier is in this case e.g. lda(filter(data)) == f(g(data)), but cross-validation only controlled the estimation of the f() part, with data that g() had already strongly overfitted. You can experience this yourself by creating random data X with a high number of channels (e.g. 256), train a CSP with it, and notice almost perfect cross-validation accuracies on the CSP’d data CSP(X) — whereas the true accuracy on fresh data should be random.
  • Since often multiple feature vectors are made from the same trial (e.g. in motor imagery ‘think left hand activity for 4 seconds’ is one trial) by using a small time window or epoching, these vectors originating from the same trial can end up both in testing and training folds. Overlapping epoching may make this worse. Since the in-trial vectors may have large correlations between each other (time-correlated EEG), this is making the crossvalidation estimate from the classifier trainer box more optimistic.

Yet another type of problem is that the eventual online scenario is not covered by the cross-validation estimate given by the ‘classifier trainer’ box. This leaves room for possible errors/discrepancies in the online scenario signal processing chain, or the ‘classifier processor’ box itself, which will do the actual prediction in the online use.

Wrapper solution

You can attempt to do cross validation across trials by explicitly defining for each fold which time segments of the signal should belong to train set and which to the test set, and use this same segmentation for training all of the supervised learning components in the signal processing scenarios you have. Finally, you make the predictions with an actual test scenario, using the test segments only, and then aggregate the predictions against the known ground truth.

In OpenViBE BCI scenarios, boxes called Stream Switch and Stimulation Based Epoching typically control what parts of the data gets routed to the supervised learning boxes – such as CSP and classifier trainer – and which class the resulting feature vector gets associated with. To get the effect that OpenViBE ignores parts of the data, one way to do this is to filter the stimulation streams that contain the markers telling what happens and when. The stimulations related to the trials belonging to the test fold should be removed from the train fold timeline, and vice versa. Then, these parts of the data with no recognizable stimuli effectively become invisible to the scenario.

The general procedure is illustrated in the following picture,

Caption The cross-validation illustrated as carried out by an external wrapper script included in this post


The operating principle

Here’s the idea as pseudocode.

For each .ov file (== BCI recording, dataset, session),

  Extract the timeline of the file (its label stream).

  For each crossvalidation fold:
    Assign the trials (time segments) to belong either to a test or a train fold

  Repeat over the k train/test folds:

    Filter the timeline in two different ways, generating two disjoint
    timeline files. In the TRAINING timeline file, trials related to training
    fold are kept as they are, whereas the test trials are filtered (the stimulation
    code identifying the trial is set to 0). In the TESTING timeline file, the opposite is done.

    Now, run the training scenarios with the TRAINING timeline file ONLY,
    and the test scenario with the TESTING timeline file ONLY. Use the same signal file for both.

    Save the predition result for each trial (i.e. time, real class, predicted class)

Collect the predictions and compute the cross-validation accuracy

The actual scripts

Here is wrapper code to perform what is described above. The scripts are written in R, but anybody who understands the general principle can easily rewrite them e.g. in Python or other favourite language. Downloads,

crossvalidation-wrappers-0.4c (17.Mar.2017) For OpenViBE 1.3.0
crossvalidation-wrappers-0.4b (30.Mar.2015) For OpenViBE 1.0.0

Requirements: R, a free platform for statistical computing (at least 2.5.12 and later) and OpenViBE 1.0.0 or later. Some tweaking of the included example scenarios may be required to match your OpenViBE version.

Steps to take

First, edit all the .R scripts in the archive and check that all the paths in them are correct, as well as the tokens that identify trials in your scenarios. For example, in two class motor imagery these could be OVTK_GDF_Left and OVTK_GDF_Right. R doesn’t know about the tokens, so you need to look up the corresponding numeric values from here.

Next, you can launch the following scripts in sequence in R.

# setwd('C:/whereever_you_extracted_the_archive/')
# source('extract-labels.R')

The purpose of this script is to extract the timeline from each recording to a .CSV file. This is needed so the R scripts can access and manipulate the timelines. Since OV is a streaming architecture, it would be more difficult to get OpenViBE to filter the timelines (at least without writing a custom box).

# source('generate-folding.R')

For each dataset you have, for k-fold cross validation, this script will create k train-test .csv file pairs. These contain the filtered stimulation timelines. The train file should be used instead of the original label stream (that was in the .ov file) in the train scenarios, and the test part should be used for the same in the test scenarios. Note that the scenarios should not read any stimulations from the original file.

Also, the generation of the folding by default random permutes the trials, so close-by trials may end up in train and test folds, and again the result may be biased by closeness-in-time. If you wish to mitigate this behavior, simply comment out the permuting line in the script.

# source('crossvalidate.R')

This script repeatedly launches Designer to train and test each pair of timelines generated for all the datasets you have. The test screnario should write the classifier predictions to a file, one prediction per trial.

# source('aggregate.R')

After the cross-validation is completed, the aggregating script assembles the results and prints the accuracies.

Scenario side

The script archive contains modified motor imagery scenarios as an example. In the scenarios, all the data/model paths have been replaced with configuration tokens, so that crossvalidate.R can specify the actual files to be used each time when it calls Designer. If you want to run these wrapper scripts for other scenarios, you need to modify them accordingly (study the MI scenarios).

To modify your existing scenarios to be crossvalidated, your scenarios should use input from two boxes: Generic Stream Reader that reads the signal part from the .ov file, and CSV Reader that reads the filtered labels from the CSV file. The following configuration tokens are declared by the R scripts and should be used to configure the boxes,


Token Meaning
${User_Signal} The path to the .ov file that contains the signal to train with
${User_TrainFold} The path to the .csv file that contains the timeline filtered for training (all training scenarios)
${User_TestFold} The path to the .csv file that contains the timeline filtered for testing (evaluation/replay scenario)
${User_Model} Path prefix that scenarios should use to store/load the learned models. Used like ‘${User_Model}_csp.cfg’ and ‘${User_Model}_classifier.xml’.
${User_Prediction} The path to the file that the predictions should be written to. What the R script expects is one result per line : timeSecs, trueClass, predictedClass, votesForPredicted, totalVotes.

In the supplied motor imagery scenarios, the script vote.lua does majority vote for the classifier predictions that were collected during each trial segment. This needs to be done as typically a small sliding window is run during the trial, and a single trial results in many predictions. These predictions need to be combined in order to get a single prediction for the trial.

The other custom script used by the scenarios – motor-imagery-bci-epoch-selector.lua – tries to make sure that the classifier is used to predict only chunks that are inside a trial, not outside it. Note that the parameters of this second lua script should be compatible with whatever parameters your epoching has: the epoch selector must pass through a bigger time chunk than what your later epoching expects. Otherwise you may get a problem that the predictions are not correctly aggregated by the vote script.


You can set g_NoGui="" in crossvalidate.R in order to have Designer visible when the cross-validation runs. Make sure that each scenario that is launched completes correctly and that reasonable results appear in the following folders,


Folder Whats expected in it
dataset-converted/ After extract-labels.R, should contain .ov and .CSV pair for each dataset
folds/ After generate_folding.R, should contain .csv files for train and test folds
models/ After crossvalidate.R has successfully called Designer to run scenario(s) for the train fold, the trained models should be here (as written by the scenarios, not R).
predictions/ After crossvalidate.R has successfully called Designer to run scenario for the test fold, the predictions should appear here (as written by the scenario, not R).



This tutorial described a way to get less biased cross-validation estimates with OpenViBE as it avoids certain problems related to conditions with streamed data. In particular, it allows all parts of the DSP chain to be trained on the training segmentation only, and tested on the test segments only. It’s clear that the procedure could be more simple to use, but at the moment there is no near-term plans to build the feature inside OpenViBE.

Feedback and discussion is welcome (e.g. on the forum).

This entry was posted in Data-analysis, Knowledge base. Bookmark the permalink.