RL-based Control of UASs Subject to Significant Disturbance and Vision Constraint

MSc assignment

The SMART research group at Saxion University of Applied Sciences is working on the design and implementation of a firefighting UAS. This UAS comprises a quadrotor and an impulse-based firefighting canon, called IFEX, which uses kinetic energy to effectively extinguish fires. The system propels water with high-pressure air at high velocity, producing a fine mist that rapidly lowers fire temperatures and dissipates oxygen as it evaporates.

When the aforementioned IFEX system releases the pressurised water, a reaction force is generated as a consequence. Modeling this reaction force using first-principles is challenging due to the complixety of the IFEX system, but experimental tests has shown that the reaction force can be modeled as an impulse force.

This impulse force can be treated as a disturbance force when designing a controller for the UAS. However, this disturbance is predictable, since the operator (or the autonomous system) can decide when to release the water, causing the disturbance.

In [2], we have tried to answer the research question: Can a Reinforcement Learning (RL) policy learn to control this system and exploit the available trigger signal to minimise the effect of the impulse disturbance?

The results showed that the RL-based controller was able to exploit the trigger signal to minimise the position error and control effort compared to other RL policies that did not have access to the trigger signal. However, the RL-controller was able to do that by performing a preimptive manoeuvre that changes the attitude of the UAS to prepare for the incoming disturbance. This manoeuvre does not take into account the main objective of the UAS: to release the water in the direction of the fire.

Therefore, the work in [2] explored the use of RL for minimising the impact of the impulse, but did not take into account the aiming objective of the system.

This assignment extends the work done in [2] by including the “aiming” objective in the RL policy training. This aiming objective can be abstracted to a vision constraint that the controller has to satisfy.

This research addresses the following main RQ:

Can an RL-based controller learn to counteract a predictable impulse disturbance while satisfying vision constraints and the following sub-questions:
1. What are the most stability-critical assumptions in [2] that affect the RL policy (ablation study)?
2. How to include vision constraints in the RL training?
3. How does the RL-controller compare to other model-based controllers in terms of satisfying the vision constraints, counteracting the impulse disturbance, and minimising its effects?

Note: the topic of fire detection, tracking, and localisation is out of this assignment’s scope.

[1] IFFS Project: https://www.saxion.nl/onderzoek/projecten/i/the-iffs-project
[2] Kousheek Chakraborty, Thijs Hof, Ayham Alharbat, and Abeje Mersha, “RL-based Control of UAVs Subject to Significant Disturbance”, submitted to ICUAS 2025.