Unifying State Representation Learning with Intrinsic Motivations in Reinforcement Learning

Finished: 2021-09-30

MSc assignment

Traditionally robots have worked in environments where they have precise knowledge of everything needed to complete their tasks. Now robots are being asked to complete tasks where they do not have full information. To complete these tasks the agent must be able to learn the information it needs. A robot can learn using reinforcement learning algorithms. The reinforcement learning algorithm can learn complicated control policies such as bipedal motion by maximizing simple rewards such as forward movement.

Reinforcement learning algorithms are affected by the “curse of dimensionality”. A robot that uses high dimensional observations such as camera images as feedback for a reinforcement learning control policy faces an exponential explosion in computation time. This problem can be mitigating by extracting important features from an image and discarding the rest. For example, an image of an obstacle can be compressed to the 3-dimensional location of the obstacle. These representations can be learned by training neural networks to discard information from an image that is unnecessary.

Currently, these representations are learned by choosing random actions then observing the consequences. There is an opportunity for representation to be learned more efficiently by choosing actions that maximize the expected learning for robots. My research will explore how reinforcement learning algorithms can be rewarded to learn control policies that maximize learning of state representations. I will report the effects of the methods on the learned representation.