Modeling, Simulation and Control of a set-up built to study flapping motion

As part of the Portwings project, the Robotics and Mechatronics group at the University of Twente developed a 2 degree of freedom flapping wing setup that is designed to be placed in a windtunnel. This setup is capable of providing real time force and torque information using an advanced sensor. Each axis is independently actuated using electronic motors. The goal of this setup is to deepen the understanding of the unsteady dynamics of flapping flight, which is currently not well understood.

With the creation of such a setup, the question remains how to control each axis to produce an efficient flapping behaviour. This report explores techniques that can be used to optimise such flapping behaviour in a model free way, using experience directly from the windtunnel setup.
Initial attempts at optimisation are made using an actor critic approach from reinforcement learning; here, problems are identified pertaining to reward sparsity, and effective ways encode policy constraints such that it produces a stable and flapping motion.

This inspires a redefinition of the policy in terms of the Fourier decomposition of a flapping trajectory; which automatically encodes the need of a periodic motion, and takes our problem away from the formal reinforcement learning definition and towards a conceptually simpler model free parameter optimisation. Optimisation over this policy is done using the actor critic structure from the original reinforcement learning problem. Experimentally it is shown that this algorithm works for simple reward functions, but that it struggles to optimise over more complicated rewards.