Authors
S Grünewälder,
JY Audibert,
M Opper,
Publication date
2009
Publisher
Total citations
Description
Metric entropy and generic chaining methods are powerful tools from probability theory that can be used to study pathwise properties of stochastic processes. Despite this fact they have largely been ignored in machine learning. We demonstrate their power in this work in applying them to a bandit problem with a Gaussian process prior. The difficulty of the setting lies in the fact that we are dealing with a continuous space of arms and we need to control the supremum of a reward process on the arms. We apply the so called Dudley integral to reduce the problem of controlling the supremum of a “difficult” stochastic process to the problem of bounding a canonical metric that is based solely on the covariance function (which is an analytical and thus “simple” object). We consider the scenario in which there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process.