Summary
- Plugin name : Outlier removal
- Version : 1.0
- Author : Jussi T. Lindgren
- Company : Inria
- Short description : Discards feature vectors with extremal values
- Documentation template generation date : May 20 2016
Description
Simple outlier removal based on quantile estimation
The outlier removal box discards extremal feature vectors. The user can specify the desired quantile limits [min,max]. The algorithm loops through the feature dimensions and computes range r(j)=[quantile(min),quantile(max)] for each dimension j. If each feature j of example i is inside r(j), the example i is kept. Otherwise it is discarded. The box is intended to be sent all the vectors of interest before being given the stimulation to start the removal.
Inputs
1. Input stimulations
The stimulation to start the removal.
- Type identifier : Stimulations (0x6f752dd0, 0x082a321e)
2. Input features
The feature vectors to prune.
- Type identifier : Feature vector (0x17341935, 0x152ff448)
Outputs
1. Output stimulations
The stimulation to announce that the removal is complete.
- Type identifier : Stimulations (0x6f752dd0, 0x082a321e)
2. Output features
The kept feature vectors.
- Type identifier : Feature vector (0x17341935, 0x152ff448)
Settings
1. Lower quantile
Lower quantile threshold. In [0,1].
- Type identifier : Float (0x512a166f, 0x5c3ef83f)
- Default value : [ 0.01 ]
2. Upper quantile
Upper quantile threshold. In [0,1].
- Type identifier : Float (0x512a166f, 0x5c3ef83f)
- Default value : [ 0.99 ]
3. Start trigger
Stimulation to start the removal at and to pass out after.
- Type identifier : Stimulation (0x2c132d6e, 0x44ab0d97)
- Default value : [ OVTK_StimulationId_Train ]
Examples
Choice [0.02,0.95] truncates at 2% of the lowest feature values and at 95% of the highest feature values, per dimension.
If the quantile range is specified as [0,1], the box will pass out the original vector set.
Miscellaneous
The box can be attempted to remove artifacts when training classifiers that are sensitive to extremal values, for example LDA. In band-power based Motor Imagery, eye blinks can cause really strong band powers, which can then bias the classifier training. With proper control of the upper quantile of this box, such examples can be pruned from the training set.
An intuitive way to think about the filtering made by the box is to imagine a hypercube (rectangle) in the data space. The boundaries of the cube correspond to the estimated quantiles. Each feature vector that is fully inside the cube is kept.
It may be difficult to choose meaningful quantile limits without looking at the feature values. The latter can be attempted with Signal Display. It is also possible to have outliers that are not in any way extremal. Such outliers can be wrongly placed in the feature space or have a wrong associated class label. This box cannot catch such problems.
Generated on Tue Jun 26 2012 15:25:54 for Documentation by 1.7.4