Séminaire (Organisé par l’Equipe de recherche DI)


Universität Potsdam, Institut für Mathematik

Classification in mutual contamination models

Vendredi 7 juin 2013 à 10h45 en salle RD134

Résumé :

In many real-world classification problems, the set of training examples for each class is contaminated by examples of the other class ; in other words, the true training labels are randomly corrupted. This training label noise comes in addition to the usual source of uncertainty in classification, which is due to overlap of the class distributions.

Previous theoretical work on this problem assumes that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. We introduce a general framework for classification with label noise that eliminates these assumptions. Instead, we give assumptions ensuring identifiability and the existence of a universally consistent estimator of the optimal risk, discrimination rule and of the unknown contamination proportions.

For any arbitrary pair of contaminated distributions, there is a unique pair of non-contaminated distributions satisfying the proposed assumptions, and we argue that this solution corresponds in a certain sense to maximal denoising. We also discuss extensions to multi-class, where some additional challenges arise.

PDF - 743.1 ko


FR SHIC 3272

Collegium UTC/CNRS