Towards continuous control for mobile robot navigation: A reinforcement learning and SLAM based approach

Finished: 2019-08-22

MSc assignment

Autonomous navigation of robots in unknown environments to a desired target without colliding with obstacles represents an impoertant aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available.

However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained. For this reason, the main goal of this project is to navigate a mobile robot with non-holonomic constraints in an unknown environment to its desired goal in minimum time while avoiding colliding with obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem.

Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. The goal of the project will be achieved by using deep deterministic policy gradient (DDPG) through an actor-critic algorithm. Deterministic policies will be utilized since it was proposed that they outperform their stochastic counterparts in high-dimensional action spaces. On the other hand, Simultaneous Localization and Mapping, also known as SLAM, techniques are to be integrated with RL in an attempt to improve the learning rate by providing more accurate estimation of the robot's states.

Based on the aforementioned goals, the aim of the project can be summarized in the following research objectives:

i. How the navigation problem of non-holonomic mobile robots can be formulated as a reinforcement learning problem that could be solved by using DDPG actor-critic algorithm?
ii. How does the level of the discretization of the action space affect the learning rate and the optimality of the learned trajectory?
iii. What type of sensors can be utilized to provide information about the states of the robot (laser range finder, odometry, camera, etc.)?
iv. What a good simulation environment can be utilized for testing RL with SLAM?
v. Select a proper reward function that can evaluate the quality of the learned trajectory.
vi. Observe if SLAM would help RL at all by comparing the learning rate with and without SLAM.