Adam Kalai,
B Kappen,
JY Audibert,
C Szepesvári,
C Szepesvári,
Publication date
Total citations
Cited by
Background: Trading exploration and exploitation plays a key role in a number of learning tasks. For example the bandit problem ([1],[2],[3],[4]) provides perhaps the simplest case in which we must decide a trade-off between pulling the arm that appears most advantageous and experimenting with arms for which we do not have accurate information. Similar issues arise in learning problems where the information received depends on the choices made by the learner. Examples include reinforcement learning and active learning, though similar issues also arise in other disciplines, for example sequential decision-making from statistics, optimal control from control theory, etc.