OptimalAI
PUBLICATIONS
Learning transition dynamics in MDPs with online regression and greedy feature selection
Authors
Guy Lever
Ronnie Stafford
John Shawe-Taylor
Publication date
2014
Publisher
Total citations
Description
We present an approach to reinforcement learning in which the system dynamics are modelled using online linear regression between feature spaces, and a compact feature representation for the dynamics model is built incrementally using greedy feature selection. Candidate features are built online using kernels centred at datapoints as they are discovered. We implement the model learning method in a policy iteration scheme. The complexity of each policy iteration (feature learning, model learning, value estimation and policy improvement) is independent of the total amount of data observed, and only linear in the amount of new data added per iteration. The approach therefore scales up to complex problems requiring a huge amount of data to learn well. We validate the approach on benchmark MDPs and simulated quadrocopter navigation.