Authors
Louis Dorard,
Dorota Glowacka,
John Shawe-Taylor,
Publication date
2009
Publisher
Total citations
Description
Multi-armed bandit problems, in analogy with slot machines in casinos, are problems in which one has to choose actions sequentially (pull arms) in order to maximise a cumulated reward (gain), with no initial knowledge on the distribution of actions/arms’ rewards. We propose a general framework for handling dependencies across arms, based on a new assumption on the mean-reward function which is that it is drawn from a Gaussian Process (GP), with a given arm covariance matrix. We show on a toy problem that this allows to perform better than the popular UCB bandit algorithm, which considers arms to be independent.