Success in Humanoid Reinforcement Learning under Partial Observation

Wuhao Wang; Zhiyong Chen

arXiv:2507.18883·cs.AI·July 28, 2025

Success in Humanoid Reinforcement Learning under Partial Observation

Wuhao Wang, Zhiyong Chen

PDF

Open Access

TL;DR

This paper demonstrates the first successful reinforcement learning of humanoid locomotion under partial observability, using a novel history encoder to achieve performance comparable to full state methods.

Contribution

It introduces a novel history encoder that enables stable policy learning under partial observation in high-dimensional humanoid tasks.

Findings

01

Achieved stable humanoid policy learning with partial observations.

02

Policy performance matches full state access baselines.

03

Demonstrated robustness to variations in robot properties.

Abstract

Reinforcement learning has been widely applied to robotic control, but effective policy learning under partial observability remains a major challenge, especially in high-dimensional tasks like humanoid locomotion. To date, no prior work has demonstrated stable training of humanoid policies with incomplete state information in the benchmark Gymnasium Humanoid-v4 environment. The objective in this environment is to walk forward as fast as possible without falling, with rewards provided for staying upright and moving forward, and penalties incurred for excessive actions and external contact forces. This research presents the first successful instance of learning under partial observability in this environment. The learned policy achieves performance comparable to state-of-the-art results with full state access, despite using only one-third to two-thirds of the original states. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Locomotion and Control · Reinforcement Learning in Robotics · Robot Manipulation and Learning