Unify Robot Actions in Camera Frame

Sicheng Xie; Lingchen Meng; Zijie Diao; Haidong Cao; Zhiying Du; Shuyuan Tu; Jiaqi Leng; Qiuyue Wang; Mingsheng Li; Shuai Bai; Zuxuan Wu; Yu-Gang Jiang

arXiv:2511.17001·cs.RO·May 14, 2026

Unify Robot Actions in Camera Frame

Sicheng Xie, Lingchen Meng, Zijie Diao, Haidong Cao, Zhiying Du, Shuyuan Tu, Jiaqi Leng, Qiuyue Wang, Mingsheng Li, Shuai Bai, Zuxuan Wu, Yu-Gang Jiang

PDF

TL;DR

This paper introduces CalibAll, a novel, training-free pipeline that calibrates camera extrinsics and standardizes robot actions in the camera frame across diverse platforms, enhancing cross-embodiment learning.

Contribution

CalibAll provides a universal, calibration-free method to annotate datasets with camera extrinsics and standardized actions, enabling better cross-platform robot learning.

Findings

01

CalibAll successfully calibrates 97K data episodes across 16 datasets and 4 robot platforms.

02

Using camera-frame actions improves cross-embodiment pretraining performance.

03

The approach outperforms existing offline calibration methods in accuracy and robustness.

Abstract

Cross-embodiment robot learning requires a unified action representation with consistent semantics across robot platforms. Existing representations suffer from platform-specific inconsistencies, while current solutions either maintain embodiment-specific action heads or learn latent action spaces, without fundamentally resolving the mismatch. We propose to unify robot actions in the camera frame using camera extrinsics, so that actions share consistent geometric semantics across different robot embodiments, including both single-arm and bimanual robots. However, most existing datasets lack camera extrinsic annotations, and existing offline calibration methods either suffer from local minima or require robot-specific training data. To address this gap, we present CalibAll, a training-free, robot-independent annotation pipeline that estimates camera extrinsics for offline datasets and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.