How to cross-validate better

  • NB: Document for OpenViBE 1.0.0 and later


This tutorial discusses potential challenges with cross-validation inside OpenViBE scenarios. Some possible problems are highlighted, and a solution is proposed on how to get better estimates on BCI scenarios. This tutorial supplies wrapper scripts for this purpose. Using the scripts is not straightforward and requires you to modify the scenarios you use, but it can in principle be customized for many different settings. This approach could work as a baseline if you need to get publication-quality crossvalidation results.

The scripts of this tutorial are provided as-is. For your personal use case, you will need to understand them and modify them and the related scenarios.

Some possible problems with cross-validation of streamed data

Sometimes users notice that the accuracy predicted by the OpenViBE classifier trainer box appears to exceed the accuracy which is eventually perceived in the online scenario. Although the ‘excitement’ and feedback in the online scenario can change the user performance (e.g. in motor imagery), there are also situations where the reported cross-validation accuracies really are too optimistic.

It should be understood that the cross-validation in the classifier trainer box works correctly in the usual machine learning sense: you give it IID (independent and identically distributed) feature vectors, and you get a correct crossvalidation estimate of the classifier performance. However, this box cannot be responsible for what happens outside it. In practice, some OpenViBE scenarios create data in a manner that the IID assumption clearly does not hold. Some examples follow.

  • The cross-validation in the box cannot take into account any other, possibly preceding supervised training stages; For example, CSP or XDAWN can make an overfitting transform of the data and the labels before the classifier training stage takes place. The final evaluating classifier is in this case e.g. lda(filter(data)) == f(g(data)), but cross-validation only controlled the estimation of the f() part, with data that g() had already strongly overfitted. You can experience this yourself by creating random data X with a high number of channels (e.g. 256), train a CSP with it, and notice almost perfect cross-validation accuracies on the CSP’d data CSP(X) — whereas the true accuracy on fresh data should be random.
  • Since often multiple feature vectors are made from the same trial (e.g. in motor imagery ‘think left hand activity for 4 seconds’ is one trial), these can end up both in testing and training folds. Overlapping epoching may make this worse. Since the in-trial vectors may have large correlations between each other (time-correlating EEG), this is making the crossvalidation estimate from the classifier trainer box more optimistic.

Yet another type of problem is that the eventual online scenario is not covered by the cross-validation estimate given by the ‘classifier trainer’ box. This leaves room for possible errors/discrepancies in the online scenario signal processing chain, or the ‘classifier processor’ box itself, which will do the actual prediction in the online use.


You can attempt to do cross validation across trials by explicitly defining which time segments of the signal belong to test and train folds globally, and use this same segmentation for training all of the supervised learning components in the DSP chain. Finally, you make the predictions with an actual test scenario, using the test segments only, and then aggregate the predictions against the known ground truth.

The operating principle

For each experiment (session),

  Take the stream of the timeline of an experiment (label stream).

  For each crossvalidation fold:
    Assign the trials (time segments) to belong either to a test or a train fold
  Repeat over all the folds
    Filter the timeline in two different ways, generating two disjoint
    timeline files. In the TRAINING timeline file, trials assigned to training
    fold are kept as they are, whereas the test trials are filtered (stimulation
    label set to 0). In the TESTING timeline file, the opposite is done.

    Now, run the training scenarios on the TRAINING timeline file ONLY,
    and the test scenario on the TESTING timeline file ONLY (the signal file
    will be the same for both).

    Save the result for each trial (i.e. time, real class, majority voted class for segment/trial)

Summarize all the results

In practice

These scripts present wrapper code to do whats described above. They are written in R, but anybody understanding the general principle could easily write them e.g. in Python.

Requirements: R, a free platform for statistical computing (at least 2.5.12 and later) and OpenViBE 1.0.0 or later.

  • Edit all the R files mentioned below and check that all the paths are correct
  • Extract the trial labels from an OV file to a CSV file, to get a pair: 01-signal.ov, 01-labels.csv. We
    need the labels in CSV to be able to filter them in R. Put these files to whatever the datasetFolder
    variable is pointing to. If you have several files, the next should be named 02-signal.ov, 02-label.csv,
    etc. You’ll need to do this only once per dataset.
  • There’s a convenience script for this

    # source('extract-labels.R')

  • Generate foldings (creates all filtered test/train label streams) by running in R
  • # source('generate-folding.R')

  • Run the training/testing on all these folds, by running in R
  • # source('crossvalidate.R')

  • Finally, to collect & display the results
  • # source('aggregate.R')

Note that these scripts currently use modified motor imagery scenarios: all the data/model paths in the scenarios have been replaced with configuration tokens, so that crossvalidate.R can specify the files to be used each time. If you want to run these scripts for other scenarios, you need to modify them accordingly (study the MI scenarios).


We have described a way to get less biased cross-validation estimates in some circumstances, as it avoids the abovementioned problems and allows all parts of the DSP chain trained on the training segmentation only, and tested on the test segments only. It’s clear the procedure could be more simple to use. We are currently investigating how more appropriate cross-validation could be integrated into OpenViBE for more straightforward use.

Feedback and discussion is welcome (e.g. on the forum).

Doc version 0.4b, 30.March.2015

This entry was posted in Data-analysis, Knowledge base. Bookmark the permalink.