Authors
John Shawe-Taylor,
Nello Cristianini,
Publication date
2000
Publisher
MIT Press
Total citations
Description
The presence of noise in the data introduces a trade-o in every learning problem: complex hypotheses can be very accurate on the training set, but have worse predictive power than simpler and slightly inaccurate hypotheses. Hence the right balance between accuracy and simplicity of a hypothesis needs to be sought and this is usually attained by minimizing a cost function formed of two parts, one describing the complexity of the hypothesis, the other measuring its training error. In the case of linear functions this leads to an additional di culty as the problem of minimising the number of training errors is computationally infeasible if we parametrize the problem in terms of the dimension of the inputs (Arora et al., 1997). We avoid this apparent impasse by bounding the generalization in terms of a di erent function of the training set performance, namely one based on the distribution of margin values, but not directly involving training error. We will show in this paper that minimising this new criterion can be performed e ciently. When considering large margin classi ers, where the complexity of a hypothesis is measured by its margin with respect to the data, the presence of noise can lead to further problems, for example datasets may be non-separable, and hence their non-separable data margin would be negative, making application of the non-agnostic result impossible. Moreover solutions found by maximizing the margin are not stable with respect to the training points {slight modi cations in the training set can signi cantly change the hypothesis {a brittleness which makes the maximal margin solution somehow undesirable. These problems have led …