CeRLP: A Cross-embodiment Robot Local Planning Framework for Visual Navigation

Haoyu Xi; Mingao Tan; Xinming Zhang; Siwei Cheng; Shanze Wang; Yin Gu; Xiaoyu Shen; Wei Zhang

arXiv:2603.19602·cs.RO·March 23, 2026

CeRLP: A Cross-embodiment Robot Local Planning Framework for Visual Navigation

Haoyu Xi, Mingao Tan, Xinming Zhang, Siwei Cheng, Shanze Wang, Yin Gu, Xiaoyu Shen, Wei Zhang

PDF

Open Access

TL;DR

CeRLP is a unified local planning framework for visual navigation across different robot types, using geometric abstraction and depth correction to improve obstacle avoidance and task success in diverse settings.

Contribution

The paper introduces CeRLP, a novel framework that generalizes visual navigation for heterogeneous robots by geometric abstraction and depth correction, reducing data needs and improving robustness.

Findings

01

Outperforms existing methods in simulation obstacle avoidance tasks.

02

Successfully generalizes to real-world point-to-point and vision-language navigation.

03

Demonstrates robustness across various robot and camera configurations.

Abstract

Visual navigation for cross-embodiment robots is challenging due to variations in robot and camera configurations, which can lead to the failure of navigation tasks. Previous approaches typically rely on collecting massive datasets across different robots, which is highly data-intensive, or fine-tuning models, which is time-consuming. Furthermore, both methods often lack explicit consideration of robot geometry. In this paper, we propose a Cross-embodiment Robot Local Planning (CeRLP) framework for general visual navigation, which abstracts visual information into a unified geometric formulation and applies to heterogeneous robots with varying physical dimensions, camera parameters, and camera types. CeRLP introduces a depth estimation scale correction method that utilizes offline pre-calibration to resolve the scale ambiguity of monocular depth estimation, thereby recovering precise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Multimodal Machine Learning Applications