Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration

Yuhang Zhang; Chao Yan; Jiaxi Yu; Jiaping Xiao; Mir Feroskhan

arXiv:2602.01040·cs.RO·February 3, 2026

Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration

Yuhang Zhang, Chao Yan, Jiaxi Yu, Jiaping Xiao, Mir Feroskhan

PDF

Open Access

TL;DR

This paper introduces CAPO, a novel contrastive prompt learning method with adaptive orchestration, enabling embodied agents to adapt efficiently to diverse and unseen environments by dynamically focusing on relevant domain factors.

Contribution

The paper proposes a hybrid contrastive learning strategy and an adaptive prompt orchestration mechanism for improved cross-embodiment visuomotor policy adaptation.

Findings

01

Outperforms state-of-the-art baselines in sample efficiency and performance

02

Demonstrates superior zero-shot adaptation to unseen environments

03

Effectively isolates task-relevant features from domain variations

Abstract

Learning adaptive visuomotor policies for embodied agents remains a formidable challenge, particularly when facing cross-embodiment variations such as diverse sensor configurations and dynamic properties. Conventional learning approaches often struggle to separate task-relevant features from domain-specific variations (e.g., lighting, field-of-view, and rotation), leading to poor sample efficiency and catastrophic failure in unseen environments. To bridge this gap, we propose ContrAstive Prompt Orchestration (CAPO), a novel approach for learning visuomotor policies that integrates contrastive prompt learning and adaptive prompt orchestration. For prompt learning, we devise a hybrid contrastive learning strategy that integrates visual, temporal action, and text objectives to establish a pool of learnable prompts, where each prompt induces a visual representation encapsulating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robot Manipulation and Learning