# On-board Deep Q-Network for UAV-assisted Online Power Transfer and Data   Collection

**Authors:** Kai Li, Wei Ni, Eduardo Tovar

arXiv: 1906.07064 · 2019-06-18

## TL;DR

This paper introduces an on-board deep Q-network for UAVs to optimize wireless power transfer and data collection from sensing devices, effectively reducing packet loss despite limited real-time information.

## Contribution

It develops a deep reinforcement learning approach that enlarges the state space for UAV scheduling, enabling online optimal decisions with outdated device information.

## Key findings

- Reduces packet loss by at least 69.2% compared to non-learning algorithms.
- Enables online decision-making with limited and outdated device state information.
- Demonstrates effectiveness of deep Q-learning in UAV-assisted wireless power and data collection.

## Abstract

Unmanned Aerial Vehicles (UAVs) with Microwave Power Transfer (MPT) capability provide a practical means to deploy a large number of wireless powered sensing devices into areas with no access to persistent power supplies. The UAV can charge the sensing devices remotely and harvest their data. A key challenge is online MPT and data collection in the presence of on-board control of a UAV (e.g., patrolling velocity) for preventing battery drainage and data queue overflow of the sensing devices, while up-to-date knowledge on battery level and data queue of the devices is not available at the UAV. In this paper, an on-board deep Q-network is developed to minimize the overall data packet loss of the sensing devices, by optimally deciding the device to be charged and interrogated for data collection, and the instantaneous patrolling velocity of the UAV. Specifically, we formulate a Markov Decision Process (MDP) with the states of battery level and data queue length of sensing devices, channel conditions, and waypoints given the trajectory of the UAV; and solve it optimally with Q-learning. Furthermore, we propose the on-board deep Q-network that can enlarge the state space of the MDP, and a deep reinforcement learning based scheduling algorithm that asymptotically derives the optimal solution online, even when the UAV has only outdated knowledge on the MDP states. Numerical results demonstrate that the proposed deep reinforcement learning algorithm reduces the packet loss by at least 69.2%, as compared to existing non-learning greedy algorithms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.07064/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1906.07064/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1906.07064/full.md

---
Source: https://tomesphere.com/paper/1906.07064