Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Hao Zhang, Ding Zhao, H. Eric Tseng

TL;DR
This paper introduces a hierarchical cognition-to-control framework for multi-agent human-humanoid collaboration, integrating long-horizon planning with real-time control to improve robustness and coordination in contact-rich tasks.
Contribution
It proposes a three-layer hierarchy combining vision-language grounding, decentralized MARL-based deliberation, and high-frequency whole-body control for enhanced HRC.
Findings
Higher success rates in collaborative tasks
Improved robustness over baselines
Emergent leader-follower behaviors
Abstract
Effective human-robot collaboration (HRC) requires translating high-level intent into contact-stable whole-body motion while continuously adapting to a human partner. Many vision-language-action (VLA) systems learn end-to-end mappings from observations and instructions to actions, but they often emphasize reactive (System 1-like) behavior and leave under-specified how sustained System 2-style deliberation can be integrated with reliable, low-latency continuous control. This gap is acute in multi-agent HRC, where long-horizon coordination decisions and physical execution must co-evolve under contact, feasibility, and safety constraints. We address this limitation with cognition-to-control (C2C), a three-layer hierarchy that makes the deliberation-to-control pathway explicit: (i) a VLM-based grounding layer that maintains persistent scene referents and infers embodiment-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Action Observation and Synchronization
