Séminaire (organisé par l’équipe de recherche DI)

Christoph Bernau

Institute for Medical Information Sciences, Biometry, and Epidemiology, Ludwig-Maximilians-University, Munich


Correcting the optimally selected resampling-based error rate : A smooth analytical alternative to nested cross-validation


Mardi 5 avril 2011 à 15h en RD134


Résumé :

Many statistical problems in bioinformatics are high-dimensional binary classification tasks, e.g. the classification of microarray samples into normal and cancer tissues. In this context, statistical learning methods usually incorporate a tuning parameter adjusting their complexity to the specific examined data set. By simply reporting the performance of the best tuning parameter value, overly optimistic prediction errors have been published in the past. A straightforward approach to avoid this "tuning bias" is nested cross-validation (CV). In this talk we are addressing two objectives. Firstly, we develop a new method correcting for this tuning bias by embedding the tuning problem into a decision theoretic framework. The method is based on the decomposition of the unconditional error rate involving the tuning procedure. Our corrected error estimator can be reformulated as a weighted mean of resampling errors obtained using the difierent tuning parameter values. In this sense, it can be interpreted as a smooth version of nested CV. The smooth weighting additionally guarantees intuitive bounds for the corrected error. Secondly, we suggest to also use bias correction methods to address the bias resulting from the optimal choice of the learning method. The latter bias is particularly relevant to prediction problems based on high-dimensional "omic" data. In the absence of standards, it is indeed common practice to apply several methods successively. This can lead to an optimistic bias similar to the tuning bias if one reports the performance of the optimal method only. We demonstrate the performance of our new method to address both types of bias based on four microarray cancer data sets and compare it to existing methods. Our main result is that our approach yields intuitively bounded estimates similar to nested CV and at a dramatically lower computational price.

Seminars


Mardi 20 juin 2017

Séminaire à 14h en GI042 (bâtiment Blaise Pascal), présenté par Patrice Perny, LIP6.
« Décision interactive sur domaine combinatoire par élicitation incrémentale de préférences ».


Jeudi 11 mai 2017

Séminaire à 14h en GI042 (bâtiment Blaise Pascal), présenté par Nicolas Maudet, LIP6 (Equipe SMA).
« Current issues in argumentation ».


Mardi 4 avril 2017

Séminaire à 14 h dans l’amphi du Centre d’Innovation de l’UTC, présenté par Xavier LAGORCE, PhD, Head of Computer Vision, Chronocam.
« Chronocam : Event-based cameras for machine vision »


Jeudi 27 octobre 2016

Séminaire à 14 h en GI016, présenté par Fabien Pfaender, UTSEUS.
« State of Cities - A Massive, Systematic, Data Powered, Comparative Analysis Of Cities ».


Pages 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ...




Actualités
Vidéothèque
Téléchargements
Annuaire



FR SHIC 3272

Collegium UTC/CNRS