UMR CNRS 7253

Antoine Bordes
Antoine Bordes
Antoine Bordes
Antoine Bordes
Antoine Bordes

Site Tools


en:olarank

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

en:olarank [2013/04/16 12:01] (current)
bordesan created
Line 1: Line 1:
 +
 +=== Introduction ===
 +//OLaRank// is an online solver of the dual formulation
 +of support vector machines for structured output spaces. Here is displayed its implementation for
 +sequence labelling using the exact and greedy inference schemes. ​
 +In both cases, the per-sequence training time is the same as a perceptron based
 +on the same inference procedure, up to a small multiplicative constant.
 +
 +Comparing the two inference schemes, the greedy version is much faster.
 +It is also amenable to higher order Markov assumptions and performs
 +similarly on test. In comparison to existing algorithms, both versions
 +match the accuracies of batch solvers that use exact inference after a
 +single pass over the training examples.
 +
 +For details, see the paper **Sequence Labelling with SVMs Trained in One Pass** by Antoine Bordes, Nicolas Usunier and Léon Bottou. in proceedings of ECML'​08.
 +
 +=== General Implementation ===
 +
 +We provide an implementation of //​OLaRankGreedy//​ and //​OLaRankExact//​ under the [[http://​www.fsf.org/​licensing/​licenses/​gpl.html|GNU Public License]].
 +
 +** Description **: The source code for //​OLaRankGreedy//​ can be downloaded here: {{en:​olarankgreedy1.0.tar.gz|olarankgreedy1.0.tar.gz}}. The one for //​OLaRankExact//​ can be downloaded here: {{en:​olarankexact1.0.tar.gz|olarankexact1.0.tar.gz}}. They both contain a C++ library implementing the basic operations aswell as an additional program (Ola_rank_greedy / Ola_rank_exact) that can be used to run experiments.((These two programs take training and testing sets together as argument. Nothing is yet available to write the model down to a file after training. Sorry.))
 +
 +The functions can be built by using ''​make''​
 +in the //OLaRank// directories. Calling these programs without argument displays a summary of command line arguments and options. ​
 +
 +** Data format **: the code takes as input files whose format is close to the LibSVM/​SVM<​sup>​light</​sup>/​SVM<​sup>​struct</​sup>​ format.
 +Each token is represented by a line in the following format:
 +
 +  <​line> ​   = <​target>​ <​feature>:<​value>​ ... <​feature>:<​value> ​
 +  <​target> ​ = <int>
 +  <​feature>​ = <​integer> ​
 +  <​value> ​  = <​float>​
 +
 +The target value and each of the feature/​value pairs are separated by a space character. The target value denotes the label of the token. A sequence of tokens is separated from another one by a blank line.
 +
 +** Examples **: The table below displays the results obtained with these implementations on the datasets and using the parameters of the //OLaRank// paper.
 +
 +
 +|  **OLaRankGreedy** ^  Train. Time ^ Test. Accuracy | **OLaRankExact** ^  Train. Time ^ Test. Accuracy ^ 
 +^  OCR  |  1.4sec. ​ |  81.5%  ^  OCR  |  4sec.  |  75.77% ​ | 
 +^  Chunking ​ |  20sec. ​ |  95.81% ​ ^  Chunking ​ |  130sec. ​ |  95.82% ​ | 
 +^  WSJ  |  175sec. ​ |  96.46% ​ ^  WSJ  |  1380sec. ​ |  96.65% ​ |
 +
 +
  

User Tools