This page gathers various data sets on which I had the opportunity to work on and that were generated during research projects. When possible, we followed the arff format to structure the data. Data sets are grouped according to the main task they intend to solve. If you ever use it in some research, just let me know.

Multi-label problems


This data set comes from the DESCRIPT project that aimed at training people at calligraphy. The task is to be able, given the features extracted from an exercice, to predict the errors made by the trainee.

Name Instances Features Labels Collector Data set link
Calligraphy (stroke level) 981 17 6 Remy Frenoy Data file
Calligraphy (Exercice level) 94 6 3 Remy Frenoy Data file

Preference Learning problems


This data set contains grading of courses by students on different aspects of the course, a global score, whether the student has passed, and whether this is a humanity course. Data set is in french for the moment.

Name Instances Features Collector Data set link
UVWEB 14830 8 UTC student office data file

