Machine learning for semantic visual SLAM

Finished: 2020-05-31

BSc assignment

The flourishing research on aerial robotics conducted in the last decade has progressively matured a growing interest towards the study of unmanned aerial vehicles (UAVs) actively interacting with the surrounding environment, giving birth to the novel branch of aerial physical interaction, often also referred to as aerial manipulation. In this context, examples of use-case applications to be accomplished include the contact-based inspection and maintenance of sensible sites and infrastructures.

A fundamental step for autonomous interactive aerial robots is to be able to perceive the environment they will interact with. Achieving consistent and repeatable interaction behavior requires prior knowledge of the environment’s geometric properties, which is not suitable for autonomous robots in unstructured scenarios. A solution for that is combining computer vision and machine learning techniques with interaction control techniques.

In the context of the SPECTORS project, cf. https://spectors.eu/wordpress/, at the Robotics and Mechatronics (RAM) research group at the University of Twente, we have already developed a fully-actuated multi-rotor aerial vehicle which can be controlled in all the directions of its configuration space, thus achieving a partial decoupling of the robot dynamics. Such property has been demonstrated to be particularly useful for the accomplishment of aerial physical interaction tasks. Thanks to an energy-based interaction controller previously developed, we are able to perform the contact inspection of surfaces whose geometric model is known a priori. Recently, we are focusing our attention to the fulfillment of autonomous interaction tasks with unknown objects. Thanks to the integration of visual sequential localization and mapping (V-SLAM) algorithms like, e.g., ORB-SLAM2, our aerial robot can be controlled to scan the environment and produce a point cloud (PCL) of the object of interest. However, at the current status, the selection of the bounding box for the interaction target is manually selected by the user.

The goal of this BSc thesis assignment is to integrate machine techniques with V-SLAM algorithms to achieve a semantic segmentation of the environment and possibly, the automatic detection of the target, once the employed algorithms have been trained with objects of the same class. In the first phase of the internship, the candidate will study the state of the art in order to identify the most promising techniques that could be employed to solve this problem. In the second phase, the goal of the candidate will be to integrate the chosen algorithm(s) in our interaction framework. A comparison of the performance of the algorithms in different scenarios is also envisioned.