HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

Puyue Wang; Jiawei Hu; Yan Gao; Junyan Wang; Yu Zhang; Gillian Dobbie; Tao Gu; Wafa Johal; Ting Dang; Hong Jia

arXiv:2602.04412·cs.RO·March 17, 2026

HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

Puyue Wang, Jiawei Hu, Yan Gao, Junyan Wang, Yu Zhang, Gillian Dobbie, Tao Gu, Wafa Johal, Ting Dang, Hong Jia

PDF

Open Access 1 Models 1 Datasets

TL;DR

HoRD introduces a two-stage reinforcement learning framework that enhances humanoid robot control robustness by combining history-conditioned adaptation with online distillation, enabling zero-shot transfer to unseen domains.

Contribution

The paper presents a novel two-stage framework that improves humanoid control robustness under domain shifts by integrating history-conditioned RL with online distillation into a transformer-based policy.

Findings

01

HoRD outperforms baselines in robustness and transfer.

02

The approach enables zero-shot adaptation to unseen domains.

03

It maintains high performance under external perturbations.

Abstract

Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
tony0517/HoRD
model

Datasets

tony0517/HoRD
dataset· 26 dl
26 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Robot Manipulation and Learning