Séminaire (Organisé par l’Equipe de recherche DI)

Gilles BLANCHARD

Universität Potsdam, Institut für Mathematik


Classification in mutual contamination models


Vendredi 7 juin 2013 à 10h45 en salle RD134


Résumé :

In many real-world classification problems, the set of training examples for each class is contaminated by examples of the other class ; in other words, the true training labels are randomly corrupted. This training label noise comes in addition to the usual source of uncertainty in classification, which is due to overlap of the class distributions.

Previous theoretical work on this problem assumes that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. We introduce a general framework for classification with label noise that eliminates these assumptions. Instead, we give assumptions ensuring identifiability and the existence of a universally consistent estimator of the optimal risk, discrimination rule and of the unknown contamination proportions.

For any arbitrary pair of contaminated distributions, there is a unique pair of non-contaminated distributions satisfying the proposed assumptions, and we argue that this solution corresponds in a certain sense to maximal denoising. We also discuss extensions to multi-class, where some additional challenges arise.

PDF - 743.1 ko


Actualités
Vidéothèque
Téléchargements
Annuaire



FR SHIC 3272

Collegium UTC/CNRS