A Plug-and-Play Fully On-the-Job Real-Time Reinforcement Learning   Algorithm for a Direct-Drive Tandem-Wing Experiment Platforms Under Multiple   Random Operating Conditions

Zhang Minghao; Song Bifeng; Yang Xiaojun; Wang Liang

arXiv:2410.15554·cs.LG·December 23, 2024

A Plug-and-Play Fully On-the-Job Real-Time Reinforcement Learning Algorithm for a Direct-Drive Tandem-Wing Experiment Platforms Under Multiple Random Operating Conditions

Zhang Minghao, Song Bifeng, Yang Xiaojun, Wang Liang

PDF

Open Access

TL;DR

This paper introduces CRL2E, a plug-and-play, real-time reinforcement learning algorithm designed for stable control of tandem-wing systems under diverse, unpredictable conditions, outperforming existing methods in accuracy and convergence speed.

Contribution

The paper presents a novel Physics-Inspired Rule-Based Policy Composer Strategy with a Perturbation Module for real-time control, significantly improving stability and efficiency over prior algorithms.

Findings

01

CRL2E achieves safe, stable training within 500 steps.

02

CRL2E improves tracking accuracy by up to 66 times over baseline algorithms.

03

CRL2E converges 36-58% faster than previous CRL methods.

Abstract

The nonlinear and unstable aerodynamic interference generated by the tandem wings of such biomimetic systems poses substantial challenges for motion control, especially under multiple random operating conditions. To address these challenges, the Concerto Reinforcement Learning Extension (CRL2E) algorithm has been developed. This plug-and-play, fully on-the-job, real-time reinforcement learning algorithm incorporates a novel Physics-Inspired Rule-Based Policy Composer Strategy with a Perturbation Module alongside a lightweight network optimized for real-time control. To validate the performance and the rationality of the module design, experiments were conducted under six challenging operating conditions, comparing seven different algorithms. The results demonstrate that the CRL2E algorithm achieves safe and stable training within the first 500 steps, improving tracking accuracy by 14 to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Aerospace and Aviation Technology · Advanced Control Systems Optimization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings