Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks

Jean Seong Bjorn Choe; Bumkyu Choi; Jong-kook Kim

arXiv:2505.07516·cs.RO·May 13, 2025

Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks

Jean Seong Bjorn Choe, Bumkyu Choi, Jong-kook Kim

PDF

Open Access

TL;DR

This paper introduces an improved average-reward maximum entropy reinforcement learning method tailored for swing-up and stabilization tasks in double pendulum systems, demonstrating robustness and adaptability in simulated environments.

Contribution

We enhanced the AR-EAPO algorithm to better handle new competition scenarios and evaluation metrics in the AI Olympics 2025 for pendulum tasks.

Findings

01

Controller successfully manages swing-up and stabilization in simulations.

02

Algorithm demonstrates robustness across revised tasks.

03

Method shows improved adaptability in updated frameworks.

Abstract

This report presents our reinforcement learning-based approach for the swing-up and stabilisation tasks of the acrobot and pendubot, tailored specifcially to the updated guidelines of the 3rd AI Olympics at ICRA 2025. Building upon our previously developed Average-Reward Entropy Advantage Policy Optimization (AR-EAPO) algorithm, we refined our solution to effectively address the new competition scenarios and evaluation metrics. Extensive simulations validate that our controller robustly manages these revised tasks, demonstrating adaptability and effectiveness within the updated framework.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Robot Manipulation and Learning