Authors
Thore Graepel,
Ralf Herbrich,
John Shawe-Taylor,
Publication date
2000
Publisher
Total citations
Description
We provide small sample size bounds on the generalisation error of linear classi ers that are sparse in their dual representation given by the expansion coe cients of the weight vector in terms of the training data. These results theoretically justify algorithms like the Support Vector Machine, the Relevance Vector Machine and K-nearest-neighbour. The bounds are a-posteriori bounds to be evaluated after learning when the attained level of sparsity is known. In a PAC-Bayesian style prior knowledge about the expected sparsity is incorporated into the bounds. The proofs avoid the use of double sample arguments by taking into account the sparsity that leaves unused training points for the evaluation of classi ers. We furthermore give a PAC-Bayesian bound on the average generalisation error over subsets of parameter space that may pave the way combining sparsity in the expansion coefcients and margin in a single bound. Finally, reinterpreting a mistake bound for the classical perceptron algorithm due to Noviko we demonstrate that our new results put classiers found by this algorithm on a rm theoretical basis.