PUBLICATIONS

Learning transition dynamics in MDPs with online regression and greedy feature selection

Authors

Guy Lever,

Ronnie Stafford,

John Shawe-Taylor,

Publication date

2014

Publisher

Total citations

Cited by 1

Description

We present an approach to reinforcement learning in which the system dynamics are modelled using online linear regression between feature spaces, and a compact feature representation for the dynamics model is built incrementally using greedy feature selection. Candidate features are built online using kernels centred at datapoints as they are discovered. We implement the model learning method in a policy iteration scheme. The complexity of each policy iteration (feature learning, model learning, value estimation and policy improvement) is independent of the total amount of data observed, and only linear in the amount of new data added per iteration. The approach therefore scales up to complex problems requiring a huge amount of data to learn well. We validate the approach on benchmark MDPs and simulated quadrocopter navigation.

Publication

PUBLICATIONS

Learning transition dynamics in MDPs with online regression and greedy feature selection

OptimalAI