UMR CNRS 7253

Antoine Bordes
Antoine Bordes
Antoine Bordes
Antoine Bordes
Antoine Bordes

Site Tools


en:olarank

Introduction

OLaRank is an online solver of the dual formulation of support vector machines for structured output spaces. Here is displayed its implementation for sequence labelling using the exact and greedy inference schemes. In both cases, the per-sequence training time is the same as a perceptron based on the same inference procedure, up to a small multiplicative constant.

Comparing the two inference schemes, the greedy version is much faster. It is also amenable to higher order Markov assumptions and performs similarly on test. In comparison to existing algorithms, both versions match the accuracies of batch solvers that use exact inference after a single pass over the training examples.

For details, see the paper Sequence Labelling with SVMs Trained in One Pass by Antoine Bordes, Nicolas Usunier and Léon Bottou. in proceedings of ECML'08.

General Implementation

We provide an implementation of OLaRankGreedy and OLaRankExact under the GNU Public License.

Description : The source code for OLaRankGreedy can be downloaded here: olarankgreedy1.0.tar.gz. The one for OLaRankExact can be downloaded here: olarankexact1.0.tar.gz. They both contain a C++ library implementing the basic operations aswell as an additional program (Ola_rank_greedy / Ola_rank_exact) that can be used to run experiments.1)

The functions can be built by using make in the OLaRank directories. Calling these programs without argument displays a summary of command line arguments and options.

Data format : the code takes as input files whose format is close to the LibSVM/SVMlight/SVMstruct format. Each token is represented by a line in the following format:

<line>    = <target> <feature>:<value> ... <feature>:<value> 
<target>  = <int>
<feature> = <integer> 
<value>   = <float>

The target value and each of the feature/value pairs are separated by a space character. The target value denotes the label of the token. A sequence of tokens is separated from another one by a blank line.

Examples : The table below displays the results obtained with these implementations on the datasets and using the parameters of the OLaRank paper.

OLaRankGreedy Train. Time Test. Accuracy OLaRankExact Train. Time Test. Accuracy
OCR 1.4sec. 81.5% OCR 4sec. 75.77%
Chunking 20sec. 95.81% Chunking 130sec. 95.82%
WSJ 175sec. 96.46% WSJ 1380sec. 96.65%
1)
These two programs take training and testing sets together as argument. Nothing is yet available to write the model down to a file after training. Sorry.

User Tools