Page 2 of 2

Re: Classifier Processor Usage with more than 2 classes.

Posted: Fri May 06, 2011 4:53 pm
by ddvlamin
Ok, that clarifies a lot I thought the algorithms were ready for multiclass, but somehow the "add settings" option was not available in the processing box. For the one versus all scheme there's always the possibility to create different binary classifiers and use the voting box. Nevertheless, I just wondered if there was something wrong with my update.

Some inner thoughts:

From a design point of view it's indeed very difficult as there does not seem to be a straightforward way to encode the output of three "machines" (in case of one versus all in the three class case) into one real output stream.
At this point however, I don't see the compability issue. If you output the class label l1 and all the probabilities of each class (p1, p2, p3...), then in case of a two-class classifier the second output p1 can directly be connected to the input of the same box just like before, the third one (p2) simply remains unused? Or do you mean that you will have one output for the label and one output that sends a matrix or rather vector of probabilities? Why then not simply use different output connectors per class? The number of classes will probably never be that high.

Regression algorithms do not have this problem, only one real output :)
Would it not be possible to rank all probability vectors, starting with an object closest to class 0 according to its probabilities and ending with the one closest to class K, then subdividing the interval [0 1] in parts, associating each probability vector with a single value of that interval and fit some regression function between the probability vectors and the computed values in the interval?

Best regards,
Dieter Devlaminck

Re: Classifier Processor Usage with more than 2 classes.

Posted: Tue May 17, 2011 9:20 pm
by yrenard
Dear Dieter,

I'm not sure what should be done, I think Fabien should be included in that discussion as well ;) - I'll poke him to this post

Yann

Re: Classifier Processor Usage with more than 2 classes.

Posted: Wed May 18, 2011 8:42 am
by fabien.lotte
Hi all,

Let me summarize a bit the problem we have : different classifiers can have very different output that will be used differently by the subsequent boxes. For instance LDA output represents the distance to a hyperplane, Mahalanobis distance classifier output represents the distance to each class prototype and the selected class is the one minimizing that distance and probabilistic classifiers (e.g., naive bayes, hidden markov models, etc.) output represents the probability (likelihood) of each class and the selected class is the one maximizing that probability. Indeed, as you mentioned, having a single value output for a multiclass classifier would not be very convenient (the solution you propose Dieter with regression would probably work, but it may not be that intuitive and easy to use). So I think we need multiple values. It can be either multiple outputs (one per class as you suggest Dieter) or a single ouput which is a vector of values (one per class as well). To me, the latter seems more flexible (it is easier to add/remove classes, potentially online) and scalable. Indeed, with a vector we can have many many classes although as Dieter mentioned, it is probably not going to happen any time soon. But we never know :-). And maybe that with other sensors than surface EEG (e.g., ECoG or intracortical arrays that could potentially be used with OpenViBE), the number of classes could be much higher. But this still leaves us with another problem: how to represent the classifier output? Having a standardized likelihood per class would indeed be great and would enable many different boxes to use the classifier processor box in the same way, independantly of the classifier used. On the other hand, we may also want to use the actual, original output of the classifier (e.g., the distance to the hyperplan or the distance to the class prototype) which will be lost if we only output a standardized likelihood. And I think this original output can be also very useful as 1) the standardized likelihood for some discriminative classifier like LDA or SVM is not a proper probability and may be misleading and 2) the original output could be more useful to perform some form of rejection (distance rejection or confusion rejection), particularly, again, for discriminative classifiers like LDA or SVM. So what we discussed internally was that, maybe, we would need 2 outputs (actually 3) : the first one being a stimulation and representing the selected class label, the second one being the natural/original/actual classifier output (distance to hyperplane, distance to class prototype, probability, etc.) which can be a single value or a vector and which depends on the classifier used and the last one being a standardized likelihood for each class, derived from the classifier natural output. This last output would be a vector and would be independent of the classifier used. The advantage of this is that we have all necessary information as well as a common representation. The drawback is that it increases the number of outputs, so it may confuse users and it forces classifier algorithm developers to compute a standardized likelihood measure for each classifier (which may not be easy/natural to do for some classifiers). So we are still not sure whether this is the right thing to do. Anyone has an opinion that? What do you think Dieter?

As for the compatibility with subsequent boxes, in my opinion, I think we should not hesitate to break this compatibility if it enables us to make OpenViBE better, more nicely designed and easier to use. In particular, since OpenViBE is still "relatively" young, we should dare to do these changes now, because it will be more difficult (or impossible) to do it later on. As for the classifier output, I think the main box using the "classifier state" output is the Graz visualization, so it is only one box. And in this case, I think that the Graz visualization box should be adapted to the classifier processor box and not the other way around. So, if needed, let's break the compatibility :-)

Naturally that's just my opinion, and any comment is most welcome!

Re: Classifier Processor Usage with more than 2 classes.

Posted: Mon May 23, 2011 7:27 am
by ddvlamin
fabien.lotte wrote: To me, the latter seems more flexible (it is easier to add/remove classes, potentially online) and scalable.
Indeed, you have a point, it is more flexible.
fabien.lotte wrote:Indeed, with a vector we can have many many classes although as Dieter mentioned, it is probably not going to happen any time soon. But we never know :-). And maybe that with other sensors than surface EEG (e.g., ECoG or intracortical arrays that could potentially be used with OpenViBE), the number of classes could be much higher.
At some point off course, if you have that many states, it's maybe better to opt for other techniques. When you have access to intracortical recordings you maybe want to estimate trajectories. In such cases recursive least squares or kalman estimators are probably the better choice.
fabien.lotte wrote:But this still leaves us with another problem: how to represent the classifier output? Having a standardized likelihood per class would indeed be great and would enable many different boxes to use the classifier processor box in the same way, independantly of the classifier used. On the other hand, we may also want to use the actual, original output of the classifier (e.g., the distance to the hyperplan or the distance to the class prototype) which will be lost if we only output a standardized likelihood. And I think this original output can be also very useful as 1) the standardized likelihood for some discriminative classifier like LDA or SVM is not a proper probability and may be misleading and
There are some things, I still do not understand. It's true that the fitted probabilities are not 100% accurate for discriminative models such as SVM, but what do you mean with misleading? Isn't the information embedded in the probability estimates (per class) equivalent with the information given by distance to hyperplane, distance to prototype, ...?

Depending on the multi-class scheme, such as one-versus-one, the output will also have very different meanings and sometimes be difficult to handle? Here you have off course additional information compared to probability estimates (pairwise comparisons).
fabien.lotte wrote:2) the original output could be more useful to perform some form of rejection (distance rejection or confusion rejection), particularly, again, for discriminative classifiers like LDA or SVM.
Ok, maybe I start to understand the usefulness of the other representation. You mean for example, if it's very far away (exceptionally far) from the hyperplane you could reject it as an outlier? While this is not possible with probabilities.

fabien.lotte wrote:So what we discussed internally was that, maybe, we would need 2 outputs (actually 3) : the first one being a stimulation and representing the selected class label, the second one being the natural/original/actual classifier output (distance to hyperplane, distance to class prototype, probability, etc.) which can be a single value or a vector and which depends on the classifier used and the last one being a standardized likelihood for each class, derived from the classifier natural output. This last output would be a vector and would be independent of the classifier used. The advantage of this is that we have all necessary information as well as a common representation.
Indeed, this seems to be the best choice as it leaves the options open.
fabien.lotte wrote:The drawback is that it increases the number of outputs, so it may confuse users and
As long as it is very well documented, this should not be a problem. Can we expect from people who build their own scenarios that they have basic knowledge of classifiers?
fabien.lotte wrote: it forces classifier algorithm developers to compute a standardized likelihood measure for each classifier (which may not be easy/natural to do for some classifiers).
This is indeed a problem as I can imagine this is not always easy to do. One could of course restrict the class of classifiers to probabilistic or Bayesian ones (no problem with hyperplane or prototype distances anymore), but that's something you do not want to do from a software development point of view I guess :) Nevertheless, for most algorithms there's probably a probabilistic counterpart, bayesian LDA, relevance vector machine,...

Is it possible to set a flag in a derived classifier algorithm to inform the base classifier that it is unable to implement the probability estimates and hence disable that output?
fabien.lotte wrote: As for the compatibility with subsequent boxes, in my opinion, I think we should not hesitate to break this compatibility if it enables us to make OpenViBE better, more nicely designed and easier to use. In particular, since OpenViBE is still "relatively" young, we should dare to do these changes now, because it will be more difficult (or impossible) to do it later on. As for the classifier output, I think the main box using the "classifier state" output is the Graz visualization, so it is only one box. And in this case, I think that the Graz visualization box should be adapted to the classifier processor box and not the other way around. So, if needed, let's break the compatibility :-)
I agree, very true.

Best regards,
Dieter Devlaminck

Re: Classifier Processor Usage with more than 2 classes.

Posted: Mon May 23, 2011 11:31 am
by fabien.lotte
ddvlamin wrote:
fabien.lotte wrote:But this still leaves us with another problem: how to represent the classifier output? Having a standardized likelihood per class would indeed be great and would enable many different boxes to use the classifier processor box in the same way, independantly of the classifier used. On the other hand, we may also want to use the actual, original output of the classifier (e.g., the distance to the hyperplan or the distance to the class prototype) which will be lost if we only output a standardized likelihood. And I think this original output can be also very useful as 1) the standardized likelihood for some discriminative classifier like LDA or SVM is not a proper probability and may be misleading and
There are some things, I still do not understand. It's true that the fitted probabilities are not 100% accurate for discriminative models such as SVM, but what do you mean with misleading? Isn't the information embedded in the probability estimates (per class) equivalent with the information given by distance to hyperplane, distance to prototype, ...?
Maybe misleading is not the rigth word, but the thing is that discriminative classifiers do not estimate the boundaries of each classes, but only the boundaries between classes. As such the probability you can derive from that are not proper probabilities. For instance, to get a probability from an SVM or LDA, people generally use the output of the classifier (i.e., the distance to the hyperplan) as input to a sigmoid function (this is what is done in LibSVM, and thus what is done for the SVM in OpenViBE). This means that a point very far from the hyperplan would have a probability of nearly 1 whereas it would clearly be an outlier as you indeed mentioned.

ddvlamin wrote: Depending on the multi-class scheme, such as one-versus-one, the output will also have very different meanings and sometimes be difficult to handle? Here you have off course additional information compared to probability estimates (pairwise comparisons).
fabien.lotte wrote:2) the original output could be more useful to perform some form of rejection (distance rejection or confusion rejection), particularly, again, for discriminative classifiers like LDA or SVM.
Ok, maybe I start to understand the usefulness of the other representation. You mean for example, if it's very far away (exceptionally far) from the hyperplane you could reject it as an outlier? While this is not possible with probabilities.
Exactly! :-)
ddvlamin wrote:
fabien.lotte wrote:So what we discussed internally was that, maybe, we would need 2 outputs (actually 3) : the first one being a stimulation and representing the selected class label, the second one being the natural/original/actual classifier output (distance to hyperplane, distance to class prototype, probability, etc.) which can be a single value or a vector and which depends on the classifier used and the last one being a standardized likelihood for each class, derived from the classifier natural output. This last output would be a vector and would be independent of the classifier used. The advantage of this is that we have all necessary information as well as a common representation.
Indeed, this seems to be the best choice as it leaves the options open.
fabien.lotte wrote:The drawback is that it increases the number of outputs, so it may confuse users and
As long as it is very well documented, this should not be a problem. Can we expect from people who build their own scenarios that they have basic knowledge of classifiers?
You're right, I think we could indeed expect that from people building their own scenarios.


ddvlamin wrote:
fabien.lotte wrote: it forces classifier algorithm developers to compute a standardized likelihood measure for each classifier (which may not be easy/natural to do for some classifiers).
This is indeed a problem as I can imagine this is not always easy to do. One could of course restrict the class of classifiers to probabilistic or Bayesian ones (no problem with hyperplane or prototype distances anymore), but that's something you do not want to do from a software development point of view I guess :) Nevertheless, for most algorithms there's probably a probabilistic counterpart, bayesian LDA, relevance vector machine,...

Is it possible to set a flag in a derived classifier algorithm to inform the base classifier that it is unable to implement the probability estimates and hence disable that output?
Indeed, we would not want to restrict the types of classifiers for OpenViBE, as long as it is possible, especially since classifier like standard LDA and SVM are very popular in the BCI community.

I really like your idea of a flag to disablte the input if needed, though! Maybe that would be a solution to this problem. Yann (or anyone from the core OpenViBE developer team :-)), do you think it would be technically possible?

Thanks for your input Dieter, really appreciate it!

Re: Classifier Processor Usage with more than 2 classes.

Posted: Sat Sep 30, 2017 4:24 pm
by hkn1304
Gents,

It's been 6 years almost. Any progress for multi class classifier support?

Thanks.

Re: Classifier Processor Usage with more than 2 classes.

Posted: Mon Oct 02, 2017 7:02 am
by jtlindgren
Hi,

OpenViBE has had multiclass classifiers for a long time and examples exist bundled with the software. In addition, openvibe supports strategies to make multiclass classifiers out of binary classifiers. You can get all these from the Classifier Trainer box. Our apologies if the documentation is not very detailed about this.

If you have specific question about multiclass classification please start a new thread.


Best,
Jussi