Authors
John Shawe-Taylor,
Peter Sykacek,
Publication date
Publisher
Total citations
Cited by
Description
It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms for neural networks is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classiers, whose coecients are given by Bayes theorem. This can be regarded as a hyperplane in a high-dimensional feature space. We provide a novel theoretical analysis of such classiers, based on data-dependent VC theory, proving that they can be expected to be large margin hyperplanes in a Hilbert space, and hence to have low eective VC-dimension. We also present an extensive experimental study conrming this prediction. This not only explains the remarkable resistance to overtting exhibited by such classiers, but also co-locates them in the same class as other systems, such as Support …