Outlier removal

Summary

Doc_BoxAlgorithm_OutlierRemoval.png
  • Plugin name : Outlier removal
  • Version : 1.0
  • Author : Jussi T. Lindgren
  • Company : Inria
  • Short description : Discards feature vectors with extremal values
  • Documentation template generation date : Jan 9 2018

Description

Simple outlier removal based on quantile estimation

The outlier removal box discards extremal feature vectors. The user can specify the desired quantile limits [min,max]. The algorithm loops through the feature dimensions and computes range r(j)=[quantile(min),quantile(max)] for each dimension j. If each feature j of example i is inside r(j), the example i is kept. Otherwise it is discarded. The box is intended to be sent all the vectors of interest before being given the stimulation to start the removal.

Inputs

1. Input stimulations

The stimulation to start the removal.

  • Type identifier : Stimulations (0x6f752dd0, 0x082a321e)

2. Input features

The feature vectors to prune.

  • Type identifier : Feature vector (0x17341935, 0x152ff448)

Outputs

1. Output stimulations

The stimulation to announce that the removal is complete.

  • Type identifier : Stimulations (0x6f752dd0, 0x082a321e)

2. Output features

The kept feature vectors.

  • Type identifier : Feature vector (0x17341935, 0x152ff448)

Settings

1. Lower quantile

Lower quantile threshold. In [0,1].

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 0.01 ]

2. Upper quantile

Upper quantile threshold. In [0,1].

  • Type identifier : Float (0x512a166f, 0x5c3ef83f)
  • Default value : [ 0.99 ]

3. Start trigger

Stimulation to start the removal at and to pass out after.

  • Type identifier : Stimulation (0x2c132d6e, 0x44ab0d97)
  • Default value : [ OVTK_StimulationId_Train ]

Examples

Choice [0.02,0.95] truncates at 2% of the lowest feature values and at 95% of the highest feature values, per dimension.

If the quantile range is specified as [0,1], the box will pass out the original vector set.

Miscellaneous

The box can be attempted to remove artifacts when training classifiers that are sensitive to extremal values, for example LDA. In band-power based Motor Imagery, eye blinks can cause really strong band powers, which can then bias the classifier training. With proper control of the upper quantile of this box, such examples can be pruned from the training set.

An intuitive way to think about the filtering made by the box is to imagine a hypercube (rectangle) in the data space. The boundaries of the cube correspond to the estimated quantiles. Each feature vector that is fully inside the cube is kept.

It may be difficult to choose meaningful quantile limits without looking at the feature values. The latter can be attempted with Signal Display. It is also possible to have outliers that are not in any way extremal. Such outliers can be wrongly placed in the feature space or have a wrong associated class label. This box cannot catch such problems.