PUBLICATIONS

Gaussian process modelling of dependencies in multi-armed bandit problems

Authors

Louis Dorard,

Dorota Glowacka,

John Shawe-Taylor,

Publication date

2009

Publisher

Total citations

Cited by 23

Description

Multi-armed bandit problems, in analogy with slot machines in casinos, are problems in which one has to choose actions sequentially (pull arms) in order to maximise a cumulated reward (gain), with no initial knowledge on the distribution of actions/arms’ rewards. We propose a general framework for handling dependencies across arms, based on a new assumption on the mean-reward function which is that it is drawn from a Gaussian Process (GP), with a given arm covariance matrix. We show on a toy problem that this allows to perform better than the popular UCB bandit algorithm, which considers arms to be independent.

Publication

PUBLICATIONS

Gaussian process modelling of dependencies in multi-armed bandit problems

OptimalAI