PILCO Reproduction | Wouter J. Meijer

The pendulum simulation in Mujuco

Currently working on reproducing the PILCO algorithm for reinforcement learning and doing a robustness analysis.

PILCO used Gausian processes for function approximation. One of the main advantages of PILCO is that is promotes data efficient reinforcement learning. This is achieved through balancing exploration and exploitation. What this means is that it is able to regulate between generating more data far from its current model (exploration) and generating more data nearby its current model (exploitation). This is regulated based on how well the target state is captured in the model.