Efficient path planning for multiple agents in agriculture fields

Finished: 2020-02-27

MSc assignment

An active area of research is co-ordinating multi-agent systems to collaborate in solving complex tasks. A proposed use case is a group of robots operating in a large agricultural field. The robots should strategically divide operations between themselves, such as mapping, weed detection, seeding, yield estimation and others. As an initial research question, the task of exploring the field optimally will be investigated further. The topology of the field can be readily defined using satellite technology and known GPS co-ordinates. Furthermore, sub-regions can be selected or identified within this field using satellite images, such as probable regions of unhealthy vegetation. The problem therefore becomes a travelling salesman type problem with multiple agents, where the goal is to optimally distribute visiting and exploring each sub-region between the agents.

It is desired that a dynamic, autonomous algorithm be developed to create optimal trajectories. Two approaches will be investigated: a centralised algorithm, where a single master co-ordinates the actions of the agents, and a decentralised algorithm, where the agents must communicate between themselves and decide on appropriate actions. Also, two types of algorithms will be investigated and compared with each other: multi-agent reinforcement learning algorithms and meta-heuristic algorithms inspired by nature. Relevant performance metrics include the time to complete the task, overlaps between trajectories and agent utilisation.

Multi-agent reinforcement learning is an extension of traditional reinforcement learning algorithms, which focus on a single agent. In the single agent problem, the agent interacts with its environment by taking actions based on observations, and receives rewards based on outcomes. This process is repeated numerous times so that the agent can learn behaviour policies which maximise its total reward. To extend this to the multi-agent problem, the learning of each individual agent must be balanced with the overall goal of the swarm, and the partial observations of each agent must be combined to obtain a global state observation. This is a challenging task and there is no consensus on the best algorithm! this research will implement a recent successful algorithm such as Value-Decomposition Networks or Mean-Feature Embeddings.

Nature algorithms include ant colony optimisation (ACO) inspired by ants and particle swarm optimisation (PSO) inspired by bird flocking. Both have been applied successfully to traveling salesman type problems. In ACO, artificial pheromones are used to create ant trails that guide a stochastic search procedure. In PSO, the best position of each particle as well as the best position of the entire swarm is used to guide the search process. Metaheuristic algorithms have less complexity than reinforcement learning, but they are not guaranteed to converge to the optimal solution.