Authors
Nello Cristianini,
John Shawe-Taylor,
Peter Sykacek,
Publication date
1998
Publisher
Total citations
Description
It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms for neural networks is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classi ers, whose coe cients are given by Bayes theorem. This can be regarded as a hyperplane in a high-dimensional feature space. We provide a novel theoretical analysis of such classi ers, based on data-dependent VC theory, proving that they can be expected to be large margin hyperplanes in a Hilbert space, and hence to have low e ective VC-dimension. We also present an extensive experimental study con rming this prediction. This not only explains the remarkable resistance to over tting exhibited by such classi ers, but also co-locates them in the same class as other systems, such as Support Vector Machines and Adaboost, which have a similar performance.