Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft
Yifei Li, Erik-Jan van Kampen

TL;DR
This paper introduces a symmetry-based data augmentation method for offline reinforcement learning, enhancing sample efficiency and policy convergence in aircraft control tasks.
Contribution
It proposes a symmetric data augmentation technique and a dual-critic structure to improve the efficiency of DDPG in symmetric dynamical systems.
Findings
Augmented samples accelerate policy convergence in aircraft control simulations.
Dual-critic structure improves sample utilization efficiency.
Symmetry exploitation enhances offline RL performance.
Abstract
The symmetry of dynamical systems can be exploited for state-transition prediction and to facilitate control policy optimization. This paper leverages system symmetry to develop sample-efficient offline reinforcement learning (RL) approaches. Under the symmetry assumption for a Markov Decision Process (MDP), a symmetric data augmentation method is proposed. The augmented samples are integrated into the dataset of Deep Deterministic Policy Gradient (DDPG) to enhance its coverage rate of the state-action space. Furthermore, sample utilization efficiency is improved by introducing a second critic trained on the augmented samples, resulting in a dual-critic structure. The aircraft's model is verified to be symmetric, and flight control simulations demonstrate accelerated policy convergence when augmented samples are employed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
