Success in Humanoid Reinforcement Learning under Partial Observation
Wuhao Wang, Zhiyong Chen

TL;DR
This paper demonstrates the first successful reinforcement learning of humanoid locomotion under partial observability, using a novel history encoder to achieve performance comparable to full state methods.
Contribution
It introduces a novel history encoder that enables stable policy learning under partial observation in high-dimensional humanoid tasks.
Findings
Achieved stable humanoid policy learning with partial observations.
Policy performance matches full state access baselines.
Demonstrated robustness to variations in robot properties.
Abstract
Reinforcement learning has been widely applied to robotic control, but effective policy learning under partial observability remains a major challenge, especially in high-dimensional tasks like humanoid locomotion. To date, no prior work has demonstrated stable training of humanoid policies with incomplete state information in the benchmark Gymnasium Humanoid-v4 environment. The objective in this environment is to walk forward as fast as possible without falling, with rewards provided for staying upright and moving forward, and penalties incurred for excessive actions and external contact forces. This research presents the first successful instance of learning under partial observability in this environment. The learned policy achieves performance comparable to state-of-the-art results with full state access, despite using only one-third to two-thirds of the original states. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Locomotion and Control · Reinforcement Learning in Robotics · Robot Manipulation and Learning
