Development and optimization of a robot navigation algorithm combining deep reinforcement learning and state representation learning

Finished: 2020-02-27

MSc assignment

Nowadays robots are used for increasingly complex tasks. Many of these tasks include navigational tasks such as mapping a building, moving objects from A to B or inspection of remote and hard to reach areas. Deep Reinforcement Learning (DRL) proves itself as a great framework for such tasks. One of the advantages of (deep) reinforcement learning is that behavior does not need to be programmed explicitly but rather rewards should be defined for several actions or reaching specific states. This can greatly reduce development time and allows for behavior that is far more complex than most manually programmed code.

These robots should often be very versatile and therefore need a lot of sensory input. However, the problem is that the rewards that need to be specified do not contain a lot of information and are generally indirect of nature. This makes it hard for DRL to make a connection between the high dimensional input signal and the uninformative reward signal. This means that the robot needs a lot of experience in order to fulfill its tasks. In robotics this is undesirable due to the rather high operational costs.

A fairy recent method to combat this problem is the use of a network prior to the DRL that is able to learn the state representation from the sensory input. This state representation has a significantly reduced dimension compared to the sensor input. This state representation can then be fed into the DRL network. An advantage of this is that the DRL does not need to learn the state representation as part of its control policies which could enhance performance and generalization.

This project will focus on the development and optimization of a state representation (deep) learning method combined with deep reinforcement learning. This framework will first be tested and developed by simulation. In a later stage the framework can be tested on a real robot in order to asses the performance in sub-optimal environments.

The proposed research has many use cases. Since the method is very general and not task specific, a lot of robotic system which perform navigation tasks using DRL could benefit from this research. Predominately this would be robots that operate in a constantly changing environment, since the state representation network could greatly enhance generalization. Since a lot of robotic tasks involve changing environments (e.g. company floor with human interaction, building with doors opening and closing) the proposed research shows a lot of potential