Towards continuous control for mobile robot navigation: A reinforcement learning and SLAM based approach

Ultimately, the goal when building autonomous mobile systems is to develop intelligent robots which, similar to humans, are able to gain and use experience to improve their navigation tech-niques. Therefore, it is crucial for robots to learn from the consequences of their actions. While path and motion planning in static and known environments is a well defined problem in the robotic community, the focus of this thesis is on the challenging topics of robot navigation in unknown environments given only local perception.

Thus, the motivation for this work is to formulate the navigation problem for mobile robots as a reinforcement learning problem where (sub)optimal trajectories to desired targets can be real-ized through trial and error interaction with the environment. Henceforth, a model-free deep deterministic policy gradient approach within an off-policy actor-critic framework is sought that aims at training a motion planner end-to-end to navigate to any random target within the workspace. The motion planner is designed by taking raw laser data, the target position in the robot’s local frame and the previously executed actions as inputs and the continuous linear and angular velocity commands as outputs where the robot has to rely on its on-board sensors to perform the navigation task. The main novelty of this work is to shape the reward function based on the online-acquired knowledge about the environment that the robot gains during training. This knowledge is obtained through a grid mapping with Rao-Blackwellized particle filter approach in such a way that the robot can learn a (sub)optimal policy in less number of iteration steps by increasing its awareness about the locations of the surrounding obstacles.

Additionally, the learned planner can generalize to unseen virtual environments as well as to a real non-holonomic differential robot platform without any fine-tuning to real-world samples. To validate the effectiveness of the proposed approach, two different virtual simulation envi-ronments are explored. The evaluation indicates that the proposed approach decreases the number of iteration steps significantly by 35.1% and 23.8% on the first and second environ-ments respectively. Other performance evaluation metrics are also introduced that demon-strate that the proposed approach significantly outperforms the standard reinforcement learn-ing approach.