X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

Kai Xiong; Hongjie Fang; Lixin Yang; and Cewu Lu

arXiv:2605.12162·cs.RO·May 13, 2026

X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

Kai Xiong, Hongjie Fang, Lixin Yang, and Cewu Lu

PDF

TL;DR

X-Imitator introduces a bidirectional, modular framework for robotic manipulation that tightly couples spatial perception and action generation, enabling continuous mutual refinement and outperforming prior methods.

Contribution

It proposes a novel dual-path, bidirectional architecture that models spatial perception and action as a coupled loop, mimicking human internal forward models.

Findings

01

Outperforms vanilla policies and prior pose-guided methods in 24 simulated and 3 real-world tasks.

02

Enables continuous mutual refinement between spatial reasoning and action generation.

03

Designed as a modular system that can be integrated into various visuomotor policies.

Abstract

Effectively handling the interplay between spatial perception and action generation remains a critical bottleneck in robotic manipulation. Existing methods typically treat spatial perception and action execution as decoupled or strictly unidirectional processes, fundamentally restricting a robot's ability to master complex manipulation tasks. To address this, we propose X-Imitator, a versatile dual-path framework that models spatial perception and action execution as a tightly coupled bidirectional loop. By reciprocally conditioning current pose predictions on past actions and vice versa, this framework enables continuous mutual refinement between spatial reasoning and action generation. This joint modeling exactly mimics human internal forward models. Designed as a modular architecture, the system can be seamlessly integrated into various visuomotor policies. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.