Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

Hao Zhang; Ding Zhao; H. Eric Tseng

arXiv:2603.03768·cs.RO·March 5, 2026

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

Hao Zhang, Ding Zhao, H. Eric Tseng

PDF

Open Access

TL;DR

This paper introduces a hierarchical cognition-to-control framework for multi-agent human-humanoid collaboration, integrating long-horizon planning with real-time control to improve robustness and coordination in contact-rich tasks.

Contribution

It proposes a three-layer hierarchy combining vision-language grounding, decentralized MARL-based deliberation, and high-frequency whole-body control for enhanced HRC.

Findings

01

Higher success rates in collaborative tasks

02

Improved robustness over baselines

03

Emergent leader-follower behaviors

Abstract

Effective human-robot collaboration (HRC) requires translating high-level intent into contact-stable whole-body motion while continuously adapting to a human partner. Many vision-language-action (VLA) systems learn end-to-end mappings from observations and instructions to actions, but they often emphasize reactive (System 1-like) behavior and leave under-specified how sustained System 2-style deliberation can be integrated with reliable, low-latency continuous control. This gap is acute in multi-agent HRC, where long-horizon coordination decisions and physical execution must co-evolve under contact, feasibility, and safety constraints. We address this limitation with cognition-to-control (C2C), a three-layer hierarchy that makes the deliberation-to-control pathway explicit: (i) a VLM-based grounding layer that maintains persistent scene referents and infers embodiment-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Action Observation and Synchronization