# Adaptive Policy Switching for Multi-Agent ASVs in Multi-Objective Aquatic Cleaning Environments

**Authors:** Dame Seck, Samuel Yanes-Luis, Manuel Perales-Esteve, Sergio Toral Marín, Daniel Gutiérrez-Reina

PMC · DOI: 10.3390/s26020427 · Sensors (Basel, Switzerland) · 2026-01-09

## TL;DR

This paper proposes a multi-agent system using adaptive learning to coordinate autonomous boats for cleaning plastic pollution in water, balancing exploration and cleanup tasks.

## Contribution

The novel contribution is an adaptive multi-objective deep reinforcement learning framework with shared multitask networks and reward-based task switching for multi-agent aquatic cleanup.

## Key findings

- The proposed framework improves hypervolume and uniformity metrics by 14% and 300% compared to fixed-phase approaches.
- The system adapts to diverse trash distributions and generates a portfolio of effective strategies for autonomous cleanup.

## Abstract

Plastic pollution in aquatic environments is a major ecological problem requiring scalable autonomous solutions for cleanup. This study addresses the coordination of multiple Autonomous Surface Vehicles by formulating the problem as a Partially Observable Markov Game and decoupling the mission into two tasks: exploration to maximize coverage and cleaning to collect trash. These tasks share navigation requirements but present conflicting goals, motivating a multi-objective learning approach. The proposed multi-agent deep reinforcement learning framework involves the utilisation of the same Multitask Deep Q-network shared by all the agents, with a convolutional backbone and two heads, one dedicated to exploration and the other to cleaning. Parameter sharing and egocentric state design leverages agent homogeneity and enable experience aggregation across tasks. An adaptive mechanism governs task switching, combining task-specific rewards with a weighted aggregation and selecting tasks via a reward-greedy strategy. This enables the construction of Pareto fronts capturing non-dominated solutions. The framework demonstrates improvements over fixed-phase approaches, improving hypervolume and uniformity metrics by 14% and 300%, respectively. It also adapts to diverse initial trash distributions, providing decision-makers with a portfolio of effective and adaptive strategies for autonomous plastic cleanup.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12845727/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12845727/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12845727/full.md

---
Source: https://tomesphere.com/paper/PMC12845727