Authors
Ronnie Stafford,
John Shawe-Taylor,
Publication date
2018
Publisher
Total citations
Description
We present a novel approach for integrating deep non-linear parametric function approximators into an existing reinforcement learning (RL) control algorithm while maintaining stable policy updates. Actively compressed conditional mean embeddings (ACCME) replaces computationally expensive batch kernel regression with a stochastically-trained neural network architecture for learning the kernel weights of a conditional mean embedding (CME) transition model. The embeddings model is then used in a model-based dynamic programming (DP) control algorithm. The ACCME variant i) improves the practicality of continual training of a CME model in online and data-abundant environments, ii) maintains a fast-evaluated contraction constraint by a sparse softmax activation function. Additionally we propose a neurobiologically-inspired mechanism for adding and removing states from the set of successor states that the embedding is defined over. This is in contrast to the original CME and later the compressed CME (CCME) models that only add new states to the set, which is problematic for maintaining non-parametric value functions in large Markov decision processes (MDPs).