# Multimodal Shared Autonomy for Heavy-Load UAV Operations with Physics-Aware Cooperative Control

**Authors:** Xu Gao, Jingfeng Wu, Yuchen Wang, Can Cao, Lihui Wang, Bowen Wang, Yimeng Zhang

PMC · DOI: 10.3390/s26061997 · Sensors (Basel, Switzerland) · 2026-03-23

## TL;DR

This paper introduces a new shared autonomy framework for heavy-load UAVs that combines speech, gestures, and haptic feedback to improve control and reduce operator workload.

## Contribution

The novel MFCN framework integrates multimodal inputs with physics-aware control to enhance UAV performance in complex tasks.

## Key findings

- MFCN improves task success rate and payload stability in UAV operations.
- The framework reduces operator cognitive workload and task completion time.
- Physics-aware constraints in MFCN suppress payload oscillations effectively.

## Abstract

Heavy-load unmanned aerial vehicles (UAVs) are increasingly being applied in logistics, infrastructure installation, and emergency response missions, where complex payload dynamics and unstructured environments pose significant challenges to safe and efficient operation. Conventional manual teleoperation interfaces, such as dual-joystick control, impose a high cognitive workload and provide limited support for expressing high-level operator intent, while fully autonomous solutions remain difficult to deploy reliably under real-world uncertainty. To address these limitations, this paper proposes the Multimodal Fusion Cooperation Network (MFCN), an end-to-end shared autonomy framework that integrates speech commands, visual gestures, and haptic cues through cross-modal feature fusion to infer operator intent in real time. The fused intent representation is translated into dynamically feasible control commands by a cooperative control policy with embedded physics-aware constraints to suppress payload oscillations and ensure flight stability. Extensive semi-physical simulations and real-world experiments demonstrate that the MFCN significantly improves the task success rate, positioning accuracy, and payload stability while reducing the task completion time and operator cognitive workload compared with manual, unimodal, and heuristic multimodal baselines.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030226/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030226/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030226/full.md

---
Source: https://tomesphere.com/paper/PMC13030226