HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration
Hao Zhang, Yaru Niu, Yikai Wang, Ding Zhao, H. Eric Tseng

TL;DR
HALyPO introduces a Lyapunov-based method to stabilize multi-agent policy learning in human-robot collaboration, enhancing generalization and robustness by addressing heterogeneity and rationality gaps.
Contribution
The paper proposes HALyPO, a novel Lyapunov policy optimization framework that ensures stability in heterogeneous multi-agent reinforcement learning for HRC.
Findings
Improved generalization in human-robot collaboration tasks.
Enhanced robustness in open-ended interaction scenarios.
Validated effectiveness through simulations and humanoid-robot experiments.
Abstract
To improve generalization and resilience in human-robot collaboration (HRC), robots must handle the combinatorial diversity of human behaviors and contexts, motivating multi-agent reinforcement learning (MARL). However, inherent heterogeneity between robots and humans creates a rationality gap (RG) in the learning process-a variational mismatch between decentralized best-response dynamics and centralized cooperative ascent. The resulting learning problem is a general-sum differentiable game, so independent policy-gradient updates can oscillate or diverge without added structure. We propose heterogeneous-agent Lyapunov policy optimization (HALyPO), which establishes formal stability directly in the policy-parameter space by enforcing a per-step Lyapunov decrease condition on a parameter-space disagreement metric. Unlike Lyapunov-based safe RL, which targets state/trajectory constraints…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI
