Classifier trainer

Summary

Doc_BoxAlgorithm_ClassifierTrainer.png
  • Plugin name : Classifier trainer
  • Version : 1.0
  • Author : Yann Renard
  • Company : INRIA/IRISA
  • Short description : Generic classifier trainer, relying on several box algorithms
  • Documentation template generation date : Oct 18 2012

Description

Performs multiple training on the feature vector set leaving a single feature vector each time and tests this feature vector on the trained classifier

The Classifier Trainer box is a generic box for classification training purpose. It works in cunjunction with the Classifier processor box. This box' role is to expose a generic interface to the rest of the BCI pipelines. The tasks specific to a given classifier are forwarded to one of the registered OVTK_TypeId_ClassifierAlgorithm algorithms. The behavior is simple, the box collects a number of feature vectors. Those feature vectors are labelled depending on the input they arrive on. When a specific stimulation arrives, a training process is triggered. This process can take some time so this box should be used offline. Depending on the settings you enter, you will be able to perform a k-fold test in order to train a better classifier. When this training stimulation is received, the box requests the selected classification algorithm to generate a configuration file that will be useable online by the Classifier processor box. Finally, the box releases a particular stimulation (OVTK_StimulationId_TrainCompleted) on its ouput, that can be used to trigger further treatments in the scenario.

Inputs

This box can have a variable number of inputs. If you need more than two classes, feel free to add more inputs and to use a classifier algorithm able to classify more than two classes.

1. Stimulations

The first input receives a stimulation stream. Only one stimulation of this stream is important, the one that triggers the training process. When this stimulation is received, all the feature vectors are labelled and sent to the classification algorithm. The training is triggered and executed. Then the classification algorithm generates a configuration file that will be used online by the Classifier processor box.

  • Type identifier : Stimulations (0x6f752dd0, 0x082a321e)

2. Features for class 1

This input receives the feature vector for the first class.

  • Type identifier : Feature vector (0x17341935, 0x152ff448)

3. Features for class 2

This input receives the feature vector for the second class.

  • Type identifier : Feature vector (0x17341935, 0x152ff448)

Outputs

1. Train-completed Flag

The stimulation OVTK_StimulationId_TrainCompleted is raised on this output when the classifier trainer has finished its job.

  • Type identifier : Stimulations (0x6f752dd0, 0x082a321e)

Settings

The number of settings of this box can vary depending on the classification algorithm you choose. Such algorithm could have specific input OpenViBE::Kernel::IParameter objects (see OpenViBE::Kernel::IAlgorithmProxy for details). If the type of those parameters is simple enough to be handled in the GUI, then additional settings will be added to this box. For this to be applied, you will have to close & reopen the settings configuration dialog after the actual classification algorithm is choosen. Supported parameter types are : Integers, Floats, Enumeations, Booleans. The documentation for those parameters can not be done in this page because it is impossible to know at this time what classifier thus what hyper parameters you will have available. This will depend on the classification algorihtms that are be implemented in OpenViBE.

1. Classifier to use

The first setting of this box is the classifier to use. You can choose any registered OVTK_TypeId_ClassifierAlgorithm algorithm you want.

  • Type identifier : Classification algorithm (0x21ce7f37, 0x28def186)
  • Default value : [ Support Vector Machine (SVM) ]

2. Filename to save configuration to

This setting points to the configuration file where to save the result of the training for later online use. This configuration file is used by the Classifier processor box. Its syntax depends on the selected algorithm.

  • Type identifier : Filename (0x330306dd, 0x74a95f98)
  • Default value : [ ]

3. Train trigger

This is the stimulation to consider to trigger the training process.

  • Type identifier : Stimulation (0x2c132d6e, 0x44ab0d97)
  • Default value : [ OVTK_StimulationId_Train ]

4. Number of partitions for k-fold test

If you want to perform a k-fold test, you should enter something else than 0 or 1 here. A k-fold test generally allows better classification rates. The idea is to divide the set of feature vectors in a number of partitions. The classification algorithm is trained on some of the partitions and its accuracy is tested on the others. The classifier with the best results is selected as the trained classifier. See the miscellaneous section for details on how the k-fold test is done in this box.

  • Type identifier : Integer (0x007deef9, 0x2f3e95c6)
  • Default value : [ 10 ]

5. Epsilon

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 0.100000 ]

6. Weight

  • Type identifier : String (0x79a9edeb, 0x245d83fc)
  • Default value : [ ]

7. SVM type

  • Type identifier : SVM Type (0x2af426d1, 0x72fb7bac)
  • Default value : [ C-SVC ]

8. Degree

  • Type identifier : Integer (0x007deef9, 0x2f3e95c6)
  • Default value : [ 3 ]

9. Kernel type

  • Type identifier : SVM Kernel Type (0x54bb0016, 0x6aa27496)
  • Default value : [ Linear ]

10. Weight Label

  • Type identifier : String (0x79a9edeb, 0x245d83fc)
  • Default value : [ ]

11. Epsilon tolerance

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 0.001000 ]

12. Cost

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 1.000000 ]

13. Cache size

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 100.000000 ]

14. Gamma

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 0.000000 ]

15. Nu

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 0.500000 ]

16. Shrinking

  • Type identifier : Boolean (0x2cdb2f0b, 0x12f231ea)
  • Default value : [ true ]

17. Coef 0

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 0.000000 ]

Examples

This box is used in BCI pipelines in order to classify cerebral activity states. For a detailed scenario using this box and its associated Classifier processor, please see the motor imagary BCI scenario in the sample scenarios.

Miscellaneous

In this section, we will detail how the k-fold test is implemented in this box. For the k-fold test to be performed, you have to choose more than 1 partition in the related settings. Suppose you chose n partitions. Then when trigger stimulation is received, the feature vector set is splitted in n consecutive segments. The classification algorithm is trained on n-1 of those segments and tested on the last one. This is performed for each segment. Then the classifier with the best accuracy is choosen.

For example, suppose you have 5 parititions of feature vectors (FVs)

+------+ +------+ +------+ +------+ +------+
| FVs1 | | FVs2 | | FVs3 | | FVs4 | | FVs5 |
+------+ +------+ +------+ +------+ +------+

For the first training, a feature vector set is built form the FVs2, FVs3, FVs4, FVs5. The classifier algorithm is trained on this feature vector set. Then the classifier is tested on the FVs1 :

+------+ +---------------------------------+
| FVs1 | |  Training Feature Vector Set 1  |
+------+ +---------------------------------+

Then, a feature vector set is built form the FVs1, FVs3, FVs4, FVs5. The classifier algorithm is trained on this feature vector set. Then the classifier is tested on the FVs2 :

+------+ +------+ +------------------------+
|Traini| | FVs2 | |ng Feature Vector Set 2 |
+------+ +------+ +------------------------+

The same process if performed on all the partitions :

+---------------+ +------+ +---------------+
|Training Featur| | FVs3 | |e Vector Set 3 |
+---------------+ +------+ +---------------+
+------------------------+ +------+ +------+
|Training Feature Vector | | FVs4 | |Set 4 |
+------------------------+ +------+ +------+
+---------------------------------+ +------+
|  Training Feature Vector Set 5  | | FVs5 |
+---------------------------------+ +------+

Important things to consider :

  • The more partitions you have, the more feature vector you have in your training sets... and the less examples you'll have to test on. This means that the result of the test will probably be less reliable. But you will be able to choose the best classifier among a more consequent list.
  • The less partitions you have, the less feature vector you have in your training sets... and the more examples you'll have to test on. This means that the online use of the trained classifier is more likely to be consistent with the trained classifier accuracy.

In conclusion, be carefull when choosing this k-fold test setting. Typical value range from 4 partitions (train on 75% of the feature vectors and test on 25% - 4 times) to 10 partitions (train on 90% of the feature vectors and test on 10% - 10 times).