DoublyAware: Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion
Khang Nguyen, An T. Le, Jan Peters, Minh Nhat Vu

TL;DR
DoublyAware introduces a dual uncertainty decomposition in TD-MPC for humanoid locomotion, improving robustness and sample efficiency by explicitly modeling planning and policy uncertainties with conformal prediction and structured priors.
Contribution
It proposes a novel dual uncertainty-aware extension of TD-MPC that explicitly separates and manages planning and policy uncertainties in humanoid robot learning.
Findings
Enhanced sample efficiency and faster convergence.
Improved motion feasibility in complex locomotion tasks.
Robust decision-making under environmental stochasticity.
Abstract
Achieving robust robot learning for humanoid locomotion is a fundamental challenge in model-based reinforcement learning (MBRL), where environmental stochasticity and randomness can hinder efficient exploration and learning stability. The environmental, so-called aleatoric, uncertainty can be amplified in high-dimensional action spaces with complex contact dynamics, and further entangled with epistemic uncertainty in the models during learning phases. In this work, we propose DoublyAware, an uncertainty-aware extension of Temporal Difference Model Predictive Control (TD-MPC) that explicitly decomposes uncertainty into two disjoint interpretable components, i.e., planning and policy uncertainties. To handle the planning uncertainty, DoublyAware employs conformal prediction to filter candidate trajectories using quantile-calibrated risk bounds, ensuring statistical consistency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics
