Towards reducing the sample complexity of a model-free reinforcement learning agent controlling a single segment tendon-driven continuum manipulator

Soft robotic endoscopes have the potential to play an instrumental role in minimally invasive surgery, reducing the patients’ recovery time and discomfort. However, control of these manipulators is challenging due to the computational cost of accurate models. Reinforcement learning can alleviate the need for such models by directly learning a policy instead. Unfortunately, model-free reinforcement learning techniques suffer from high sample complexity, limiting their practical use.

This work aims to outline an end-to-end process to develop a practically viable reinforcement learning controller based on the soft actor-critic algorithm by reducing its sample complexity. A tendon-driven continuum manipulator is fabricated and then modeled using a non-linear autoregressive exogenous neural network. This model is used to generate a student policy that imitates expert behavior as well as the policy of a model-free agent trained in simulation. The simulated agent’s policy as well as the student policy are used to initialize a model-free learner, with the intent reducing the sample complexity by allowing the agent to focus on fine-tuning an already competent policy. The effectiveness of these methods is evaluated by comparing the performance as a function of learning time with that of an agent that was trained without any prior knowledge.

Results indicate that while the endoscope is able to learn a reaching task, the sparsity of information about the state-space in the student policy and the model inaccuracies used to develop the simulated agent lead to performances that were similar or worse for a given number of training steps.

BlueJeans videoconference join information:

Meeting URL

Meeting ID
119 113 808

Want to dial in from a phone?

Dial one of the following numbers:
+31.20.808.2256 (Netherlands (Amsterdam))
(see all numbers -

Enter the meeting ID and passcode followed by #