Average-Reward Maximum Entropy Reinforcement Learning for Global Policy in Double Pendulum Tasks
Jean Seong Bjorn Choe, Bumkyu Choi, Jong-kook Kim

TL;DR
This paper introduces an improved average-reward maximum entropy reinforcement learning method tailored for swing-up and stabilization tasks in double pendulum systems, demonstrating robustness and adaptability in simulated environments.
Contribution
We enhanced the AR-EAPO algorithm to better handle new competition scenarios and evaluation metrics in the AI Olympics 2025 for pendulum tasks.
Findings
Controller successfully manages swing-up and stabilization in simulations.
Algorithm demonstrates robustness across revised tasks.
Method shows improved adaptability in updated frameworks.
Abstract
This report presents our reinforcement learning-based approach for the swing-up and stabilisation tasks of the acrobot and pendubot, tailored specifcially to the updated guidelines of the 3rd AI Olympics at ICRA 2025. Building upon our previously developed Average-Reward Entropy Advantage Policy Optimization (AR-EAPO) algorithm, we refined our solution to effectively address the new competition scenarios and evaluation metrics. Extensive simulations validate that our controller robustly manages these revised tasks, demonstrating adaptability and effectiveness within the updated framework.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Robot Manipulation and Learning
